Modeling the survival times of the COVID-19 patients with a new statistical model: A case study from China

Over the past few months, the spread of the current COVID-19 epidemic has caused tremendous damage worldwide, and unstable many countries economically. Detailed scientific analysis of this event is currently underway to come. However, it is very important to have the right facts and figures to take all possible actions that are needed to avoid COVID-19. In the practice and application of big data sciences, it is always of interest to provide the best description of the data under consideration. The recent studies have shown the potential of statistical distributions in modeling data in applied sciences, especially in medical science. In this article, we continue to carry this area of research, and introduce a new statistical model called the arcsine modified Weibull distribution. The proposed model is introduced using the modified Weibull distribution with the arcsine-X approach which is based on the trigonometric strategy. The maximum likelihood estimators of the parameters of the new model are obtained and the performance these estimators are assessed by conducting a Monte Carlo simulation study. Finally, the effectiveness and utility of the arcsine modified Weibull distribution are demonstrated by modeling COVID-19 patients data. The data set represents the survival times of fifty-three patients taken from a hospital in China. The practical application shows that the proposed model out-classed the competitive models and can be chosen as a good candidate distribution for modeling COVID-19, and other related data sets.


Introduction
The first outbreak of the current COVID-19 epidemic was first seen in the popular seafood market in the Chinese city of Wuhan, where large numbers of people come to buy or sell seafood. As of December 31, 2019, a total of 27 cases of COVD- 19  In the current situation, it is of great interest to study more about COVID-19 to make comparison between different countries. In the domain and practice of big data science, to provide the best description of the data under consideration is a prominent research topic. The recent studies have pointed out the applicability of statistical models to provide the best description of the random phenomena. In this article, we focus on this research area of distribution theory, and introduce a new statistical model to provide the best fit to data in linked with COVID- 19 and other related events.
The modified Weibull distribution is one of the most prominent modifications of the Weibull distribution which is introduced to improve the fitting power of the exponential, Rayleigh, linear failure rate and Weibull distributions; see [28]. We further carry this area of distribution theory and introduce a new prominent version of the modified Weibull distribution to improve its fitting power. A random variable X, is said to follow the modified Weibull distribution with shape parameter α and scale parameters κ 1 and κ 2 , if its cdf (cumulative distribution function) denoted F(x;X), is given by x � 0; a; k 1 ; k 2 > 0; where X = (α, κ 1 , κ 2 ). The pdf (probability density function) corresponding to expression Eq 1 is In this article, we focus on proposing a new modification of the modified Weibull distribution called the arcsine modified Weibull (ASM-Weibull) distribution. The ASM-Weibull distribution is introduced by adopting the approach of the arcsine-X distributions of [29], which can be obtained as a sub-case of [30]. The cdf and pdf of the arcsine-X distributions are given, respectively, by where F(x;X) is cdf of the baseline random variable. The respective pdf is gðxÞ ¼ 2 p f ðx; XÞ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi 1 À Fðx; XÞ 2 q ; x 2 R: The cdf of the proposed ASM-Weibull distribution is obtained by using the expression 1 in 2. The flexibility and applicability of the ASM-Weibull distribution are examined via an application to the survival times of the COVID-19 patient data.

Basic mathematical properties
This section deals with the computation of some statistical properties of the ASM-Weibull distribution.

Quantile function
Let X denote the ASM-Weibull random variable with cdf 3, then the qf (quantile function) of X, denoted Q(u), is given by where u has the uniform distribution on the interval (0,1).

Moments
This subsection deals with the computation of r th moment of the ASM-Weibull distributionthat can be further used to obtain important characteristics. It is often employed in computing the main properties and characteristics of the distribution (as an example of this characteristics skewness, central tendency, dispersion, and kurtosis). In this section, we derive the r th moment of the ASM-Weibull distribution as follows ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi Using binomial series that is convergent when |t| < 1 (see, https://socratic.org/questions/ how-do-you-use-the-binomial-series-to-expand-f-x-1-sqrt-1-x-2), we have 1 ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi Using 7, we have 1 ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi The expression 8 can also be written as 1 ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi Using expression 9 in 6, we have Using the series e −t , we have Let t = κ 1 (i + 1)x α , then using the expression 11, we get Using expression 12 in 10, we get For different values of α and κ 2 , and fixed value of κ 1 , the plots of mean, variance, skewness, and kurtosis of the ASM-Weibull distribution are presented in Figs 4 and 5.
Furthermore, the mgf (moment generating function) of the ASM-Weibull distribution denoted by M X (t) has the form

Maximum likelihood estimation
Here, we derive the maximum likelihood estimators (MLEs) of the ASM-Weibull distribution parameters based on the complete samples only. Let x 1 , x 2 , . . ., x n represent the observed values from the ASM-Weibull distribution with parameters α, κ 1 and κ 2 . Corresponding to Eq 4, the total log-likelihood function ℓ(α, κ 1 , κ 2 ) is given by The numerical maximization of ℓ(α, κ 1 , κ 2 ) can be done either by using the computer software or via differentiation on behalf of α, κ 1 and κ 2 . Corresponding to Eq 13, the partial derivatives are as follows Solving @ @a 'ða; . For more information and extensive reading about MLEs, we refer to [31-34].

Simulation study
In this section of the paper, we provide a brief Monte Carlo simulation study to evaluate the MLEs of the ASM-Weibull distribution parameters. The ASM-Weibull distribution is easily simulated by inverting the expression 3. Let U has a uniform distribution U(0,1), then the nonlinear equation by inverting Eq 3 is The simulation is performed for two different sets of parameters (i) α = 0.7, κ 1 = 1, κ 2 = 0.5, and (ii) α = 1.2, κ 1 = 1.4, κ 2 = 0.5.
The random number generation is obtained via the inverse cdf. The inverse process and simulation results are obtained via a statistical software R using (rootSolve) library with command mle. The sample size selected as n = 10, 20, . . ., 500 and the Monte Carlo replications made was 500 times. For the maximization of the expression 13, the algorithm "LBFGS-B" is used with optim(). For i = 1, 2, . . ., 500, the MLEs ðâ;k 1 ;k 2 Þ of (α, κ 1 , κ 2 ) are obtained for each set of simulated data. The assessing tools such as biases and mean square errors (MSEs) are considered. These quantities are calculated as follows where Θ = (α, κ 1 , κ 2 ). The coverage probabilities (CPs) are calculated at the 95% confidence interval (C.I).
The summary measures of the simulated data presented in Table 1 and the box plots are provided in Fig 6. For the simulated data set 1, (i) the histogram and Kernel density estimator are presented in  Table 2.
For the simulated data set 2, (i) the histogram and Kernel density estimator are presented in Fig 10, (ii) the fitted pdf and cdf are sketched in Fig 11, and (iii) the Kaplan-Meier survival and QQ (quantile-quantile) plots are provided in Fig 12. Whereas, the corresponding simulation results are given in Table 3.

Applications to COVID-19 data sets
The main interest of the derivation of the ASM-Weibull distribution is its use in data analysis objectives, which makes it useful in many fields, particularly, in the fields dealing with lifetime analysis. Here, this feature is illustrated via taking two sets of data related to COVID-19 epidemic events.
We illustrate the best fitting power of the ASM-Weibull as compared with the other two parameters, three parameters and four parameters well-known lifetime competitive distributions namely: inverse Weibull (IW), extend odd Weibull exponential (ETOWE), Kumaraswamy Weibull (Ku-W), odd log-logistic modified Weibull (OLL-MW), and Frechet Weibull (FW) distributions. The pdfs of the competitive models are • Ku-W distribution f ðxÞ ¼ abak 1 x aÀ 1 e À k 1 x a ð1 À e À k 1 x a Þ aÀ 1 ½1 À ð1 À e À k 1 x a Þ a � bÀ 1 ; x > 0:

PLOS ONE
• OLL-MW x > 0: We show that the ASM-Weibull distribution provides the best fit to the lifetime data related to the COVID-19 epidemic. The term "best fit" is used in the sense that the proposed model has smaller values of the criterion selected for comparison. These criterion consist of some discrimination measures. These measures are • The AIC (Akaike information criterion) • The CAIC (Corrected Akaike information criterion) • The BIC (Bayesian information criterion) BIC ¼ k log ðnÞ À 2'; • The HQIC (Hannan-Quinn information criterion) where ℓ is the value of the log-likelihood function under the MLE, k refers to the number of parameters of the model, and n is the sample size.
In addition to these measures, we also consider other important goodness of fit measures including the Anderson-Darling (AD) statistic, Cramer-von Mises (CM) statistic and the Kolmogorov-Smirnov (KS) statistic with p-value, for detail information about these measures see [35]. A model with the lowest values of the above mentioned measures could be chosen as the best model for the real data set.

PLOS ONE
For the computation of the numerical results, we use the Newton-Raphson iteration procedure with optim() R-function with the argument method ="BFGS" to estimate the model parameters. The numerical estimates of the unknown parameters of the ASM-Weibull and other fitted distributions are obtained using the R-script AdequacyModel with the "BFGS" algorithm.

Survival times of the COVID-19 patients data
In this subsection, we consider the survival times of patients suffering from the COVID-19 epidemic in China. The considered data set representing the survival times of patients from the time admitted to the hospital until death. Among them, a group of fifty-three (53) COVID-19 patients were found in critical condition in hospital from January to February 2020.

PLOS ONE
The summary measures of the first data are provided in Table 4. Whereas, The histogram of COVID-19 data along with the total time test (TTT) plot are sketched in Fig 13, shows that the data set is right-skewed (histogram).
The MLEs of the ASM-Weibull and other models are provided in Table 5. The discrimination measures of the fitted distributions are provided in Table 6, and the goodness of fit measures are provided in Table 7.
From the values of the criteria provided in Tables 6 and 7, we see that the ASM-Weibull model is far from the concurrence. Indeed, for the COVID-19 lifetime data, for instance, it satisfies smaller values of the AIC, CAIC, BIC, HQIC, CM, AD, KS and high p-value against the AIC, CAIC, BIC, HQIC, CM, AD, KS and high p-value for the second-best distribution.
Furthermore, for the COVID-19 lifetime data, a graphical check of the fit of the ASM-Weibull model are presented in Fig 14. For this purpose, we consider the curves of the estimated pdf, cdf, PP (probability-probability) and Kaplan-Meier survival plots of the ASM-Weibull distribution. For the ASM-Weibull model, the estimated cdf and pdf are given by Gðx;â;k 1 ;k 2 Þ and gðx;â;k 1 ;k 2 Þ, respectively, where G(x;α, κ 1 , κ 2 ) is defined by Eq 3, g(x;α, κ 1 , κ 2 ) is defined by Eq 4, and ðâ;k 1 ;k 2 Þ are the obtained MLEs for (α, κ 1 , κ 2 ). For instance, based on Eq 3, the second row of Table 5, and the plot of the ASM-Weibull distribution in Fig 14 representing the estimated cdf is given by arcsineð1 À e À 0:5332x 0:8290 À 0:0156x Þ; x � 0;   Corresponding to this data set, the summary measures are provided in Table 8. Whereas, the histogram and TTT plots are sketched in Fig 17. For the second data set, the MLEs of the ASM-Weibull and other models are provided in Table 9. The discrimination and goodness of fit measures are provided in Tables 10 and 11, respectively.
From the values of the selected criteria reported in Tables 10 and 11, we see that the ASM-Weibull model is a better model as it has the smaller values of the AIC, CAIC, BIC, HQIC, CM, AD, KS and high p-value against the AIC, CAIC, BIC, HQIC, CM, AD, KS and high pvalue for the second-best distribution. Furthermore, for the second COVID-19 data, the graphical display of the pdf, cdf, PP (probability-probability) and Kaplan-Meier survival plots of the ASM-Weibull distribution are presented in Fig 18. The graphs sketched in Fig 18, show that the ASM-Weibull distribution provide the best description to the COVID-19 mortality rate data. For the second data set, the likelihood function is plotted in Figs 19 and 20, which confirms the existence and uniqueness properties of the MLEs, respectively.

Concluding remarks
The two-parameter Weibull model has shown great applicability in the practice of statistical sciences particularly, reliability engineering, biomedical and financial sciences. In this

PLOS ONE
study, a new modification of the Weibull model is introduced using the modified Weibull distribution with the "Arcsine strategy". The proposed model is called the arcsine modified Weibull distribution. The maximum likelihood estimators of the ASM-Weibull parameters are obtained and a Monte Carlo simulation study is conducted. To show the applicability of the ASM-Weibull model, two real-life data sets related to COVID-19 events are considered. The comparison of the proposed model is made with the other well-known competitors. To figure out the close fitting of the fitted distributions, certain analytical tools including four discrimination measures and three goodness of fit measures as well as the p-value are considered. Based on these analytical measures, we showed that the ASM-Weibull model provides a better fit than the other competitors, supported by graphical sketching and numerical tools. Furthermore, corresponding to COVID-19 data sets, the log-likelihood function is also plotted confirming the existence and uniqueness properties of the MLEs. We hope that beyond the scope of this paper, the ASM-Weibull can be applied to analyze other forms of the COVID-19 data.