Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Statistical inferences for type-II hybrid censoring data from the alpha power exponential distribution

Abstract

This paper describes a method for computing estimates for the location parameter μ > 0 and scale parameter λ > 0 with fixed shape parameter α of the alpha power exponential distribution (APED) under type-II hybrid censored (T-IIHC) samples. We compute the maximum likelihood estimations (MLEs) of (μ, λ) by applying the Newton-Raphson method (NRM) and expectation maximization algorithm (EMA). In addition, the estimate hazard functions and reliability are evaluated by applying the invariance property of MLEs. We calculate the Fisher information matrix (FIM) by applying the missing information rule, which is important in finding the asymptotic confidence interval. Finally, the different proposed estimation methods are compared in simulation studies. A simulation example and real data example are analyzed to illustrate our estimation methods.

1 Introduction

In experiments involving life testing and reliability, complete failure time information may not be achieved for all items. Therefore, the data obtained from this type of experiment are called censored data, for which cost effectiveness and minimization of the total testing time are important. Various censoring schemes that can be used in reliability analysis have been published. The two most frequently considered censoring schemes in the reliability literature are the type-I censoring scheme (T-ICS) and T-IICS. In T-ICS, a test stops at a prefixed time T, whereas in T-IICS, the test stops after a prefixed r test units have failed. Thus, the number of observed failures is random in T-ICS and the duration of the experiment is random in T-IICS.

T-ICS and T-IICS can be combined to form a hybrid censoring scheme (HCS), as first presented by Epstein [1], who studied the properties of the one-parameter exponential distribution. For more details about HCS, the reader is referred to Balakrishnan and Kundu [2]. HCSs are categorized as type-I and type-II. To see this, a type-I hybrid censoring scheme (T-IHC) is defines as follows. If the test is stopped at a random time, T* = min{Xr:n, T}, where r and T are prefixed numbers, n is the sample size, and Xr:n indicates the time of the rth failure. In many papers, the T-IHC was examined, for example Kundu [3], and Kundu and Pradhan [4]. If the test is stopped at time T* = max{Xr:n, T}, the HCS is called T-IIHC [5]. The advantage of this approach is that it allows the complete lifetimes of at least r units to be recorded before the experiment is stopped. Several authors, such as Banerjee and Kundu [6], Balakrishnan and Shafay [7], Singh et al. [8], and Salah [911], considered statistical inference under T-IIHC. Dey et al. [12] studied the estimation of the generalized inverted exponential distribution under a HCS. The weighted exponential distribution was considered for the T-IIHC hybrid in Kohansal et al. [13]. Singh et al. [14] presented an estimation procedure for the two-parameter Lomax distribution under a T-IIHC. They considered applying MLEs and Bayes’ estimation for parameter and reliability studies, see Dong et al. [15]. Sharma [16] discussed an estimation procedure for sample prediction problems based on a T-IIHC sample for the flexible Weibull distribution. Recently, Sen et al. [17] studied the T-IIHC for the generalized exponential distribution.

In recent years, many distributions have been generalized by adding more shape parameters because many applications in engineering, finance, biomedicine, and environmental science indicated that such distributions are not powerful for explaining data sets. Hence, to make effective progress in these applications, continuous extension of these distributions is required. More recently, Mahdavi and Kundu [18] presented a novel method, called alpha power transformation (APT), for adding an extra parameter to a continuous distribution. The proposed method is useful for incorporating skewness in a family of distributions. They applied their method to the one-parameter exponential distribution and produced the two-parameter APED. Additionally, they introduced many properties of the APED, such as explicit expressions for the order statistics, moment functions and quantiles. The probability density function (PDF), and the cumulative distribution function (CDF) of the APED are, respectively, (1) (2) The corresponding reliability and hazard rate functions are (3) and (4) where X > 0, μ > 0 and λ > 0. Note that in the rest of this paper, it is assumed that α ≠ 1. For more details about APED see Salah [19].

The primary purpose of the present paper is to propose an estimation method for the parameters and the reliability characteristics for APED using incomplete sample observations obtained by a T-IIHC. To the best of our knowledge, no attempt has been made to estimate these characteristics for APED using a T-IIHC. This study aims to fill this gap using MLEs via EMA to compute and compare the outcomes with those calculated using the NRM. Furthermore, the asymptotic confidence intervals (ACIs) for the APED parameters were computed. The rest of this article is organized as follows. In the following section, the MLEs of the unknown parameters and the failure functions and reliability are discussed. The ACIs and the FIM are presented in Sections 3 and 4, respectively. A real data set is examined for illustrative purposes in Section 5, while conclusions are discussed in Section 6.

2 Maximum likelihood estimation

MLE is an important and widely used method for fitting statistical models because of its attractive properties, such as asymptotic efficiency, consistency, and asymptotic unbiasedness. Here, we describe the attainment of MLEs of the model parameters based on T-IIHC samples via the NRM and EMA.

2.1 Newton–Raphson algorithm

Consider a sample of n units and let T be a preselected experimental time and r a predetermined number of units out of a total of n units. The experiment ends at time T* = max{xk:n, T}. Furthermore, let J be the number of failures that occurred before time T, i.e, Then, under the T-IIHC, we have the following observations:

Case 1: x1:n < x2:n < … < xr:n if xr:n > T, or

Case 2: x1:n < … < xr:n < xr+ 1:n < xJ:n < T < xJ+ 1:n if rJ < n, and xJ:n < T < xJ+ 1:n, or

Case 3: x1:n < x2:n < … < xn:n < T, if xn:n < T.

The likelihood functions of these three different cases are: Hence, (5) By combining the three likelihood functions, one obtains where k and c are given by The log-likelihood function = ln L(μ, λ) without the constant term can be written as (6) The MLEs of μ and λ are obtained by differentiating Eq (6) with respect to μ and λ. The simultaneous equations are expressed as (7) and (8) It is difficult to obtain an analytical solution to these nonlinear equations. Therefore, we can estimate the parameters μ and λ using statistical software or by solving the two simultaneous Eqs (7) and (8) numerically by, for example, NRM with a good initial guess of μ(0) and λ(0).

Utilizing the property of invariance (replacing μ and λ by their ML estimators and ), we can obtain the MLE of the reliability and hazard function from Eqs (3) and (4) by (9) and (10)

2.2 Expectation maximization algorithm

The EMA is a very powerful method for finding MLEs for parametric models when the data are censored; see Dempster et al. [20]. It consists of two iterative steps: (i) the expectation step (E-step) and (ii) the maximization step (M-step). The E-step of each iteration computes only the conditional expectation of the log-likelihood with respect to the incomplete data given the observed data. In the M-step, the parameter value is calculated by maximizing the expected log-likelihood function obtained in the E-step. For more details about the EMA, see McLachlan and Krishnan [21].

Let X = (X1:n, X2:n, …, Xk:n) denote the observed data and Z = (Z1, Z2, …, Znk) denote the censored data. Here, for a given k, Z1, Z2, …, Znk are not observable. The complete data are given by the combination of W = (X, Z).

In the E-step, the conditional expected value of the log-likelihood for the given complete, observed sample must be calculated. Hence, the log-likelihood function for the complete data is (11) and (12) Now, for j = 1, …, nk, the PDF of Zj given X1:n = x1:n, X2:n = x2:n, …, Xk:n = xk:n is given by (see Ng et al. [22]) (13) According to Eq (13), we can write (14) and (15) Next, the M-step involves the maximization of Eq (12); if at the h-th stage, the estimate of (μ, λ) is (μ(h), λ(h)), then (μ(h+1), λ(h+1)) can be estimated by maximizing (16) By taking the derivative of Eq (16) w.r.t μ and λ and setting them equal to 0, we first find λ(h+1) by solving and g(λ) is given as follows (17) where (18) Then, μ(h+1) is obtained as

Remark: The iterative scheme for obtaining the MLEs of (μ, λ) using the EMA is terminated when |μ(h+1)μ(h)| + |λ(h+1) − λ(h)| < ϵ, where ϵ > 0 is a preassigned small number. When convergence occurs, the present μ(h+1) and λ(h+1) are the MLEs of μ and λ obtained via the EMA; we refer to these values as ().

According to the invariant property of MLEs, the MLEs of the reliability and hazard functions of APED via the EMA, denoted by and , respectively, can be obtained by replacing μ and λ in Eqs (3) and (4) with their MLE estimates (19) and (20) In the following section, we consider interval estimation and the reliability and hazard functions of APED(μ, λ) under type-II hybrid censored data.

3 Confidence intervals

3.1 Asymptotic confidence intervals

In this subsection, we derive the ACIs of μ, λ, R(t), and H(t). To achieve this aim, we use the bivariate central limit theorem to obtain the asymptotic distribution of the unknown parameters, i.e., μ and λ, and apply the delta method to determine the asymptotic distributions of R(t) and H(t).

One of the characteristics that distinguishes the MLE is its asymptotic variance of the inverse of the Fisher information matrix. Because the MLEs of the parameters are not obtained in a closed form, it is not possible to obtain the Fisher information matrix and construct ACIs. Therefore, the Fisher information is approximated using the observed Fisher information evaluated at the MLE. The ACIs based on the asymptotic normal distribution of the MLEs are approximated as the inverse of the observed Fisher information matrix evaluated at the MLE.

The two unknown parameters μ and λ are approximately bivariate normal with mean () and variance–covariance matrix , (see Greene [23] and Agresti [24]), where (21) is the inverse of the matrix in Eq (21). The variance–covariance matrix is then (22) Therefore, the large sample theorem can be used to compute the two-sided 100(1 − γ)% estimated confidence intervals for μ and λ, respectively, as (23) where is the percentile of the standard normal distribution with right-tail probability γ/2. Moreover, to construct the ACIs for the reliability and hazard functions, we apply the delta method to estimate their variances. Let (24) Then, the asymptotic estimators of and are defined as (25) where I−1 is the variance–covariance matrix defined in Eq (22). Therefore, we have the relationships Furthermore, we can derive the 100(1 − γ)% ACIs of R(t) and H(t) by (26)

3.2 Fisher information matrix

This section presents the observed FIM using the missing value rule of Louis [25]. To construct the ACIs, as (27) where θ = (μ, λ), X = observed data, W = complete set, IX(θ) = observed information, IW(θ) = complete information, and IW|X(θ) = the information missing. For α ≠ 1, the likelihood function Lc of the APED for the complete data is (28) The log-likelihood function ln Lc of the APED for the complete data is (29) The second partial derivatives of ln Lc are The expected values of the second derivatives are Then, the complete information becomes (30) where (31) (32) and (33) The FIM of the censored data can be given as (34) where To obtain the variance–covariance matrix of and , one can invert the observed information matrix as (35) The approximate 100(1 − γ)% confidence intervals for and are (36) where is a standard normal variate. In addition, the 100(1 − γ)% ACIs of R(t) and H(t) are estimated using the delta method as (37) where (38) where and are given by Eq (24).

4 Simulation study

This section presents Monte Carlo simulation study to estimate the performance of the MLEs of μ, λ, R(t), and H(t) achieved by applying the NRM and EMA. The parameter values of μ, λ, and α and sample size n are required for this simulation. In this study, we used parameters values α = 10, μ = 2, and λ = 1, the sample size n was set to 20, 30, and 40, and k was chosen such that the observed data were 70% and 90% censored. A mission time of t = 3.0 was taken for the survival and failure rate functions. Hence, R(t) = 0.6348 and H(t) = 1.1048. For the point estimation methods, we compared the expected values (EVs) and mean squared errors (MSEs) of the estimators for μ, λ, and the reliability and hazard functions, see Zeg et al. [26, 27]. For the interval estimation methods, the 95% confidence intervals were compared according to the average length (AL) and coverage probability (CP). For the selected options of (n, k, T), the MLEs of μ, λ, R(t) and H(t) were obtained using the NRM and EMA in 1000 replications. The results are reported in Tables 1 and 2.

thumbnail
Table 1. Expected Value (EV), Mean Squared Error (MSE), Average Length (AL), and Coverage Probability (CP) of μ and λ when μ = 2, λ = 1, α = 10 and t = 3 for varying (n, k, T).

https://doi.org/10.1371/journal.pone.0244316.t001

thumbnail
Table 2. Expected Value (EV), Mean Squared Error (MSE), Average Length (AL), and Coverage Probability (CP) of R(t) and H(t) when μ = 2, λ = 1, α = 10 and t = 3 for varying (n, k, T).

https://doi.org/10.1371/journal.pone.0244316.t002

From these results the following conclusions can be drawn.

  1. (i). When the number of failures k is fixed and sample size n increases, the MSEs and width of the 95% confidence intervals of the MLEs computed using both the EMA and NRM decrease. Therefore, the MLE process performs well in terms of estimating the parameters of APED. Moreover, the expected values are close to the true values.
  2. (ii). The parameters estimation values under the EMA algorithm and their respective MSEs are smaller than those computed via the NRM.
  3. (iii). As the sample size n increases, the average length of all intervals decreases. On average, the ACIs obtained via the EMA have a shorter length, and the coverages of the confidence intervals in all cases are close to 95%.
  4. (iv). The MSEs and the widths of the confidence intervals of the MLEs estimated by the EMA and NRM decrease as the number of failures (k) increases for a fixed n sample size.
  5. (v). The MSEs and the length of the ACIs for the parameters, as well as for the reliability and hazard functions, decrease for fixed n and k as T increases.

5 Numerical examples

5.1 Simulated data analysis

Here, T-IIHC data with n = 20, k = 15, T = 1.5 were generated from APED with μ = 1.5, λ = 1, and α = 3. The generated data were For reliability characteristics, we used mission time t = 3. Based on the T-IIHC sample, the MLEs using the NRM for μ, λ, R(t = 3) and H(t = 3) were computed as follows and the variance–covariance matrix is given by Then, the 95% confidence intervals for μ, λ, R(t) and H(t) when the NRM was used are (0.6649, 2.5967), (0.1218, 2.6810), (0.0411, 0.4057) and (0.0774, 0.6654), respectively.

Conversely, we used the EMA, as described in Sections 2 and 3, and stopped the iterative process when the difference between two consecutive iterations was less than 10−6. The MLEs for μ and λ obtained via the EMA require 0.06 s and seven iterations to converge to and , and the MLEs R(t = 3) and H(t = 3) are and . Further, the variance–covariance matrix is given by Moreover, the 95% confidence intervals for μ, λ, R(t = 3) and H(t = 3) are (1.2894, 1.9723), (0.7797, 1.8678), (0.0031, 0.4895) and (0.0656, 0.7241), respectively.

This example shows that the MLEs obtained by EMA converge to the true values of the unknown parameters μ and λ better than those obtained by the NRM.

5.2 Real data example

In the experiment described in this subsection, one set of real data was used to demonstrate the applicability of the suggested method to real-life applications. The data, which represent the strength of a single carbon fiber and impregnated 1000-carbon fiber tows, measured in GPa, were taken from Bader and Priest [28]. Mahdavi and Kundu [18] reported the data of single carbon fibers tested at a gauge length of 1 mm as

2.247, 2.64, 2.908, 3.099, 3.126, 3.245, 3.328, 3.355, 3.383, 3.572, 3.581, 3.681, 3.726 3.727, 3.728, 3.783, 3.785, 3.786, 3.896, 3.912, 3.964, 4.05, 4.063, 4.082, 4.111, 4.118, 4.141, 4.246, 4.251, 4.262, 4.326, 4.402, 4.457, 4.466, 4.519, 4.542, 4.555, 4.614, 4.632, 4.634, 4.636, 4.678, 4.698, 4.738, 4.832, 4.924, 5.043, 5.099, 5.134, 5.359, 5.473, 5.571, 5.684, 5.721, 5.998, 6.06.

For the previous complete data, Mahdavi and Kundu [18] obtained the MLEs of α, μ, and λ, which were found to be 673.8379, 2.247, and 1.1562, respectively. They examined the validity of the APED based on the parameters , and , using the Kolmogorov–Smirnov (KS) test. They observed that the KS test results was 0.0925, and the corresponding p-value was 0.7243. Therefore, they concluded the APE model provides a good fit for the data set presented above.

Here, we estimate the values of μ and λ when α is known (α = 673.8379). First, we computed the MLEs of the unknown parameters using the NRM: and . The KS distance between the fitted and empirical CDFs was 0.1117, and the associated p-value was 0.487. Therefore, according to the result of the NRMM, we cannot reject the assumption that the source of the data set is the two-parameter APED. Furthermore, the 95% confidence intervals of μ and λ are (1.89194, 2.60206) and (0.944682, 1.43854).

We also computed the KS distance based on the EMA, where the MLEs of μ and λ were and . The associated 95% confidence intervals were (2.0511, 2.67933) and (1.00053, 1.513), respectively. The KS distance was 0.0927 and the associated p-value was 0.7217. Based on the p-value of the KS statistic, the MLEs obtained via EMA also provide a satisfactory estimate of the data set. The empirical survival function and the fitted survival functions are drawn in Fig 1.

thumbnail
Fig 1. Empirical and ftted survival functions of NRM and EMA estimates for the real data.

https://doi.org/10.1371/journal.pone.0244316.g001

From the above data set, we artificially created a hybrid censored data set with n = 56, k = 50, and T = 4. Based on the T-IIHC sample, the MLEs obtained via the NRM and EMA for μ, λ, R(t = 4), and H(t = 4) were computed with the associated 95% confidence intervals; see Table 3. According to Table 3, all estimates are satisfactory for this data set.

thumbnail
Table 3. MLEs and 95% CIs of μ, λ, R(t) and H(t) with α = 673.8379 and T = 4, for Bader and Priest [28] data.

https://doi.org/10.1371/journal.pone.0244316.t003

6 Conclusion

In this article, statistical inference of T-IIHC data from an APED was described. The MLE method cannot be derived analytically; therefore, the EMA and NRM were conducted to compute the considered parameters. A simulation study was performed to assess the performance of the different schemes for the APED in estimated and real data. In the simulation study, we noted that both the EMA and the NRM produced satisfactory results, but the EMA provided better estimates. Based on the T-IIHC sample, the MLEs obtained via the NRM and EMA for μ, λ, R(t = 4) and H(t = 4) were computed, along with the associated 95% confidence intervals, and we can state that all considered estimates are satisfactory for the real data set. Its deserve to study in future the inferences on APE parameters under a balanced two-sample type-II progressive censoring scheme.

References

  1. 1. Epstein B. (1954). Truncated life-tests in the exponential case. Annals of Mathematical Statistics, Vol. 25, pp. 555–564.
  2. 2. Balakrishnan N. and Kundu D. (2013). Hybrid Censoring: Models, Inferential Results and Applications. Computational Statistics and Data Analysis, (with discussion), Vol. 57, no. 1, 166–209.
  3. 3. Kundu D. (2007). On hybrid censored Weibull distribution. Journal of Statistical Planning and Inference, Vol. 137, pp. 2127–2142.
  4. 4. Kundu D. and Pradhan B. (2009). Estimating the parameters of the generalized exponential distribution in presence of hybrid censoring. Communications in Statistics-Theory and Methods, Vol. 38, pp. 2030–2041.
  5. 5. Childs A., Chandrasekhar B., Balakrishnan N., Kundu D. (2003). Estimation exact likelihood inference based on type-I and type-II hybrid censored samples from the exponential distribution. Annals of the Institute of Statistical Mathematics, Vol. 55(2), pp. 319–330.
  6. 6. Banerjee A. and Kundu D. (2008). Inference based on type-II hybrid censored data from a Weibull distribution. IEEE Transactions on Reliability, Vol. 57, pp. 369–378.
  7. 7. Balakrishnan N. and Shafay R.A. (2012). One and two-sample bayesian prediction intervals based on type-II hybrid censored data. Communications in Statistics-Theory and Methods, Vol. 41, pp. 1511–1531.
  8. 8. Singh S.K., Singh U. and Sharma V.K. (2013). Bayesian analysis for type-II hybrid censored sample from inverse Weibull distribution. International Journal of System Assurance Engineering and Management, Vol. 4 (3), pp. 241–248.
  9. 9. Salah M. (2016a). Parameter estimation of the Marshall-Olkin exponential distribution under type-II hybrid censoring schemes and its applications. Journal of Statistics Applications and Probability, Vol. 5(3), pp. 1–8.
  10. 10. Salah M. (2016b). Moments of Upper Record Values from Marshall-Olkin Exponential Distribution. Journal of Statistics Applications and Probability, Vol. 5(2), pp. 257–263.
  11. 11. Salah M. (2018). Bayesian estimation of the scale parameter of the Marshall-Olkin exponential distribution under progressively type-II censored samples. Journal of Statistical Theory and Applications, Vol. 17(1), pp. 1–14.
  12. 12. Dey S. and Pradhan B. (2014). Generalized inverted exponential distribution under hybrid censoring. Statistical Methodology, Vol. 18, pp. 101–114.
  13. 13. Kohansal A., Rezakhah S., and Khorram E. (2015). Parameter estimation of type-II hybrid censored weighted exponential distribution. Communications in Statistics-Simulation and Computation, Vol. 44(5), pp. 1273–1299.
  14. 14. Singh K., Singh U. and Yadav A.S. (2017). Bayesian estimation of Lomax distribution under type-II hybrid censored data using Lindley’s approximation method. International Journal of Data Science, Vol. 2, pp. 352–368.
  15. 15. Dong B., Ma X., Chen F. and Chen S. (2018). Investigating the Differences of Single- and Multi-vehicle Accident Probability Using Mixed Logit Model, Journal of Advanced Transportation, Vol. 2018, UNSP 2702360.
  16. 16. Sharma V.K. (2018). Estimation and prediction for type-II hybrid censored data follow fexible Weibull distribution. Statistica, Vol. 76(4), pp. 385–414.
  17. 17. Sen T., Tripathi Y.M. and Bhattacharya R. (2018). Statistical inference and optimum life testing plans under type-II hybrid censoring scheme. Annals of Data Science, Vol. 5, pp. 679–708.
  18. 18. Mahdavi A. and Kundu D. (2017). A new method for generating distributions with an application to exponential distribution. Communications in Statistics-Theory and Methods, Vol. 46(13), pp. 6543–57.
  19. 19. Salah M. (2020). On Progressive Type-II Censored Samples from Alpha Power Exponential Distribution, Journal of Mathematics, Volume 2020, Article ID 2584184.
  20. 20. Dempster A. P., Laird N. M. and Rubin D. B. (1977). Maximum Likelihood from Incomplete Data via the EM Algorithm (with discussion), Journal of the Royal Statistical Society Series B 39,1–39.
  21. 21. McLachlan G. and Krish T. (2007). The EM algorithm and extensions, Vol. 382, John Wiley & Sons.
  22. 22. Ng H. K. T., Chan P. S. and Balakrishnan N. (2002). Estimation of Parameters from Progressively Censored Data Using EM Algorithm, Computational Statistics and Data Analysis 39, 371–386.
  23. 23. Greene W.H. (2000). Econometric Analysis. 4th edition. New York, Prentice Hall.
  24. 24. Agresti A. (2002). Categorical Data Analysis. 2nd Edition. New York, Wiley.
  25. 25. Louis T.A. (1982). Finding the Observed information matrix when using the EM-algorithm. Journal of the Royal Statistical Society. Series B, Vol. 44(2), pp. 226–233.
  26. 26. Zeg Q., Huang H., Pei X., and Wang S. (2017). A multivariate random parameters Tobit model for analyzing highway crash rate by injury severity. Accident Analysis and Prevention, 99, pp. 184–191.
  27. 27. Zeg Q., Wen H., Wang S. Huang H., Guo Q. and Pei X. (2020). Spatial joint analysis for zonal daytime and nighttime crash frequencies using a Bayesian bivariate conditional autoregressive model. Journal of Transportation Safety and Security, Vol. 12(4), pp. 566–585.
  28. 28. Bader M. and Priest A. (1982). Statistical aspects of fibre and bundle strength in hybrid composites. Progress in Science and Engineering of Composites, pp. 1129–1136.