A modified truncated distribution for modeling the heavy tail, engineering and environmental sciences data

Truncated models are imperative to efficiently analyze the finite data that we observe in almost all the real life situations. In this paper, a new truncated distribution having four parameters named Weibull-Truncated Exponential Distribution (W-TEXPD) is developed. The proposed model can be used as an alternative to the Exponential, standard Weibull and shifted Gamma-Weibull and three parameter Weibull distributions. The statistical characteristics including cumulative distribution function, hazard function, cumulative hazard function, central moments, skewness, kurtosis, percentile and entropy of the proposed model are derived. The maximum likelihood estimation method is employed to evaluate the unknown parameters of the W-TEXPD. A simulation study is also carried out to assess the performance of the model parameters. The proposed probability distribution is fitted on five data sets from different fields to demonstrate its vast application. A comparison of the proposed model with some extant models is given to justify the performance of the W-TEXPD.


Introduction
Truncated probability models are efficiently used when stochastic variable is confined in some domains. They are required in almost every field like astronomy, epidemiology, biometry, engineering and economy. For instance, the government is interested to know the population of families who are living in New York city having monthly income more than 50,000 US dollars. Another example is recruitment of police officials who meet the minimum prerequisite academic qualification. In engineering, the measurements are taken by using a detector which detects the signals above a specific limit and the weak signals are not taken into account. In all the above situations we need truncated probability distributions to model them.
Weibull and Exponential distributions are immensely utilized in reliability and lifetime analysis due to their simplicity and easy mathematical manipulations. The Weibull distribution is generated by the Swedish physicist [1]. It is commonly used for modeling reliability, electrical engineering [2], mechanical engineering [3], life time and environmental sciences data due to wide-variety of shapes. A considerable literature discussing the methods of estimation of Weibull parameters is given by [4,5]. [6] stating that Weibull distribution becomes reversed J-shaped, exponential and bell shaped for the shape parameter <, = and > 0 respectively. A comprehensive account on truncated Weibull distribution is given by [7,8]. [9] fits the truncated Weibull distribution in different areas like to analyse the diameter of trees by truncating data at a specific threshold level and to infer the height of small trees. [10] studies the method of moments to compute the moment expression for two parameters, three-parameters and truncated (left, right and doubly) Weibull distributions. Exponential distribution is also famous for modeling the data due to availability of good estimators, and its nice mathematical properties (e.g. being memory less). [11] defines the maximum likelihood estimator of scale parameter for Exponential distribution. [12] computes the parameter estimates of truncated Gamma probability density function (pdf). [13] distinguishes the worth of truncated probability density function in hydrology by computing the truncated moment expressions (TMEs) as well as complete moments of different densities and notify that complete moments are regarded as a special case of truncated moment expressions. [14] utilizes both skew-Cauchy (SK-CD) and truncated skew-Cauchy (TSK-CD) probability functions for modeling the exchange rate between the U.K pound sterling and the U.S dollar from 1800 to 2003 and verdicts that TSK-CD is a better probability function to model the data set contrary to SK-CD. [15] studies the truncated version of the Birnbaum-Saunders (BS) distribution to enhance a forecasting of actuarial model, specifically for modeling data regarding insurance payments that establish a deduction.
In the field of hydrology, [16] uses the generalized exponential (GE) distribution to study the flood frequency for Polish Rivers. [17] employs Weibull density function to study accrual failure detector and calls it "Weibull Distribution Failure Detector for Cloud Computing". [18] introduces a new generalized form of Weibull probability model i.e. "Alpha logarithmic transformed Weibull distribution" (ALTW) to model the failure time of turbocharger of engine.
In engineering, [19] fitted the micro-level spatial joint and macro-level model with conditional autoregressive (CAR) to analyze the zonal crash using three years urban highway data in the USA. They conclude that micro-level model better fits the data. [20] proposes a new Bayesian Spatio-temporal model to study the association between frequency of free way incidence and other risk factors. [21] considers the mixed logit model to identify the main factors of single and multiple vehicle accidents. Similarly, in another study [22] applies the mixed logit model to analyze the significant factors of single vehicle (SV) and multiple vehicle (MV) accidents by using the 10 years truck drivers data at rural highway of the USA. [23] develops Weibull-Lindely distribution by compounding of two distributions and highlights its worth by fitting it on three medica data sets. [24] introduces U-statistics for Weibull distribution parameters and compares it with nine parameter estimation techniques.
Some distinct characteristics motivated us to demonstrate the W-TEXPD like: (i) it is distinctive by the induction of a new scale parameter obtained from the new truncated transformed distribution along with the usual induction of location parameter; (ii) the W-TEXPD shows monotonic, non-monotonic and bathtub shaped hazard rates which make the W-TEXPD a better model than those lifetime models that only demonstrate constant or monotonically increasing/decreasing hazard rates; (iii) it can be viewed that various known lifetime classical models are the special cases of W-TEXPD; (iv) it can be observed that W-TEXPD is appropriate for fitting the scattered, skewed (spread) and/or heavy tailed (flat curved) data which may not be appropriately fitted by other typical probability density functions; and (v) the results achieved by Monte Carlo simulation study for different sample sizes reveals the stability of the model parameters. Finally, we intend to find that how well W-TEXPD performs as compared to several renowned classical lifetime models by using five data sets having skewed and heavy tailed data.
The manuscript is sorted as: In Section 2, W-TEXPD is described and its characteristics such as hazard function, cumulative hazard function, raw and central moments, skewness, kurtosis, Shannon's entropy and order statistics are derived. In Section 3, Maximum likelihood estimates of the model parameters are obtained. In Section 4, Monte Carlo simulation study is performed to examine the performance of W-TEXPD for different choices of the model parameters. In Section 5, the feasibility of the proposed model is studied by fitting it to the real data sets and comparing with some baseline models. Some concluding remarks are recorded in Section 6.
2 Weibull-Truncated Exponential distribution (W-TEXPD) [25] suggests a new method for generating a family of truncated distributions called T-X T family of distributions by using a new function given as Let X be a non-negative random variable truncated on left having probability density function (pdf) f(x T ) and distribution function (cdf) F(x T ) on domain [τ, 1). Also let T be a random variable with pdf r(t) and cdf Then the cdf of T-X T family of distributions is where R(t) is the cdf of random variable T, while the corresponding pdf of T-X T family of distributions is The idea presented in Eq (2) is extended by the method of generating a new family of distributions called T-X family of distributions proposed by [26] which is the extension of Beta Generated distributions originally introduced by [27].
Suppose X be an exponential random variable having density function with corresponding cdf The T-Truncated Exponential distribution defined by [25] is expressed as: Let T be the Weibull random variable having pdf The Weibull-Truncated exponential distribution (W-TEXPD) is defined by using Eq (9) as Where τ ----> Location parameter. θ, α----> Scale parameter. β ----> Shape parameter.
• W-TEXPD reduces to Shifted Gamma-Weibull [28] and three parameter Weibull [29] distribution for θ = 1.   1 À e À yðxÀ tÞ a f g b ¼ P; x ¼ t þ a y log ð1 À PÞ: Theorem 2.1. Let X T be a stochastic variable following W-TEXPD, then Shannon entropy is given by Proof. The Shannon entropy of a random variable X T is a measure of uncertainty given as ; ; To solve the above integral, we use the following combinations of logarithms and exponential given by [30] (Jeffrey and Zwillinger. 2007, 7th edition, Eq. (4.331.1), p. 571).
By using I 1 and I 2 , Eq (25) becomes Theorem 2.2. The r th order statistics f r;n (x) of a random sample of size n for the W-TEXPD distribution is given by Proof. By definition f r;n ðxÞ ¼ C r;n gðxÞfGðxÞg rÀ 1 f1 À GðxÞg nÀ r ; or where

Estimation of model parameters by using Maximum Likelihood (ML) method
In this Section, we estimate the unknown parameters of W-TEXPD by applying maximum likelihood estimation method as defined by [31]. The log-likelihood function of W-TEXPD is given by: Now computing the first partial derivatives of (28) with respect to τ, θ, α, β and equating the results to zero, we have @ ln Lðt; y; a; b; xÞ @t ¼ Min½x i �; i ¼ 0; 1; 2::::::::::; n; ð29Þ @ ln Lðt; y; a; b; xÞ respectively. Since the Eqs (30) to (32) are not in closed form, we use a well-known iterative method i.e. Newton Raphson to obtain the approximate ML estimates for the parameters θ, α and β.

Asymptotic confidence bounds
It is observed that ML estimates of the unknown parameters θ, α, β of W-TEXPD are not in closed forms. In this situation, we compute the asymptotic confidence bounds of W-TEXPD based on the asymptotic distribution of the MLE. The Fisher Information matrix can be used for interval estimation and hypothesis testing. For W-TEXPD, Information matrix is obtained by computing the second partial derivatives of the Eqs (30) to (32) as: the entries of Fisher Information matrix of W-TEXPD are: I yb ¼ @ ln Lðt; y; a; b; xÞ @y@b ¼ À respectively, where δ ξ/2 is the 1 − δ ξ/2 percentile of standard normal distribution. The log-normal approximation works well if the standard error of parameters is greater than half of their point estimate.

Simulation study
The core feature of probability is randomness and uncertainty. The randomness exists in every field of life. Simulation imitates the realization of a random experiment, so that random values are generated (that are deterministic) by using an appropriate model designed on the basis of random experiment. A simple such model can be a probability distribution that is used to sketch a real mechanism that produces values of some quantity of interest.
Here, we carry out Monte Carlo simulation studies to assess the performance of maximum likelihood estimates (MLEs) using the R programming. The Monte Carlo simulations are run 1000 times and in each replication, random sample of size n is drawn from the W-TEXPD (α, β, θ). The model parameters are estimated by maximum likelihood method. Table 1 presents the average point estimates of three parameters with standard errors (SEs), bias and mean square errors (MSEs) for the sample sizes 20, 50, 100 and 200. A fixed seed is used to generate such random numbers implying that all results of these studies can always be exactly replicated.
The assessment is based on a simulation study by applying following steps: 1. Generate one thousand samples of size n each using Eq (14).
3. Compute the SEs of the MLEs for the one thousand samples.

Compute the biases and mean square errors by using
bias ¼ 1 1000 respectively. Table 1 shows that biases and MSEs vary with respect to n. The biases and MSEs for each parameter approaches to zero as sample size increases.

Application 1: Aeronautical engineering
To demonstrate the strength of W-TEXPD, we use Aeronautical Engineering data set to show that the proposed distribution can be a better model than the base line distributions i.e. Weibull, truncated Exponential (TEXPD), Gamma and Exponential distributions. We re-analyse the data extracted from [32] to illustrate our proposed model. The data given below are the failure times of air conditioning system in an airplane. [33] used Exponentiated Exponential distribution to model the same data and estimate the parameter as well. The data set of 31 observations is recorded as: 23,261,87,7,120,14,62,47,225,71,246,21,42,20,5,12,120,11,3,14,71,11,14,11,16,90,1,16, 52 and 95. Table 2 reveals certain descriptive statistics regarding set of observations under study which tells that the data set is positive skewed and heavy tailed towards right. Table 3 provides the estimated values along with standard errors of unknown parameters of W-TEXPD, Weibull, Gamma, Exponential and Truncated Exponential (TEXPD) distributions by using ML method. The negative log-likelihood, Akaike information criterion (AIC) and Bayesian information criterion (BIC) are computed to compare the models. A distribution with the highest negative log-likelihood value and the smallest AIC and BIC values indicates the better model contrary to other fitted distributions. The values in Table 3 verdict that the proposed model has the highest value of negative log-likelihood and the lowest values of AIC and BIC supporting the new suggested distribution. Table 4 provides the values of different test statistics which are used to analyse the goodness of fit for the distributions. The distribution having the smallest value of test statistics fits the best. It is obvious from values in the Table 4 that the W-TEXPD distribution leads to a better fit than the Weibull, Gamma, Exponential and TEXPD distributions.
We graphically study the efficacy of new proposed distribution W-TEXPD by sketching pdf, Q-Q, cdf and P-P plots for the above data set to check the goodness of fit. From Fig 3,

PLOS ONE
evident from pdf plot that theoretical/predicted probabilities are closer to the empirical histogram which highlights that W-TEXPD better fits the data set. The corresponding cdf, Q-Q and P-P plots suggest the same results as well. It is also observed that the W-TEXPD follows the diagonal line more closely than the empirical line.

Application 2: Electrical engineering
In this section, we use a real data set to show that the W-TEXPD distribution can be a better model than Weibull, truncated Exponential (TEXPD), Gamma and Exponential distributions.
The following data set is taken from [7] which represents the failure times of 50 components (per 1000h). [34] Table 5 connotes that the data are highly spread, skewed and long right tailed.  Table 6 provides parameter estimates of the fitted distributions along with (l), AIC and BIC values. The above table shows that the suggested model leads to better fit than the baseline distributions i.e. Weibull & TEXPD, Gamma and Exponential distributions for describing the certain data.
The smallest values of goodness of fit statistic, i.e. K-S, C-Von and A-Darling for W-TEXPD in Table 7 prove that the proposed model fits the given data best among the other models.
We graphically studied the performance of W-TEXPD by sketching pdf, Q-Q, cdf and P-P plots. We can observe in Fig 4 that the closer empirical and theoretical lines show that the W-TEXPD better fits the above data.

Application 3: Mechanical engineering
To demonstrate the strength of W-TEXPD, we fit our suggested model on the data extracted from [35]. The following observations are the numbers of revolutions (in millions) before failure of 23 ball bearings in a life testing experiment. [28,36] Table 8 reveals certain descriptive statistics of the under study data suggesting that the data are slightly skewed and right tailed.   Table 9 provides parameter estimates. The larger value of (l) and the smaller values of AIC and BIC reflect a better model. In this aspect, it is evident from the statistics that the suggested model provids a better fit than the baseline distributions i.e. Weibull & TEXPD, Gamma and Exponential distributions for the certain data. Table 10 gives numerical values of goodness of fit tests. We can make an upshot from these statistical values that suggested model better fits to the above data.
The histogram that is superimposed by the empirical pdf in Fig 5 also suggest that W-TEXPD better fits the data. Similarly P-P plot also support the proposed model and gives evidence that it is a pliable probability model for such type of data.

Application 4: Bio-chemical engineering
This example consist of vinyl chloride data which has been taken from the clean up-gradient monitoring wells in μmg/L. [37] utilized following data set to estimate the upper confidence    Table 11 displays certain descriptive statistics of the under study data reflecting that the data set is skewed and right tailed.
The statistics in Table 12 display that the suggested model W-TEXPD better explains the vinyl chloride data than the classical probability distributions. Table 13 shows the values of goodness of fit test statistics. It is oberved that W-TEXPD has the smallest values which establish that this model fits the best among the rest.
One can clearly observe from histogram and P-P plots of W-TEXPD and the other models given in Fig 6 that the proposed distribution provides the best fit among the competitive models for the Vinyl chloride data.

Application 5: Environmental sciences
The data is taken from a book "Loss Distributions" [39]. In 1977 the following 40 losses, due to wind-related catastrophes, were recorded to the nearest $ 1,000,000. These data include only those losses of 2,000,000 or more; and, for convenience, they have been ordered and recorded in millions. [40] fits Alpha-Power Pareto distribution and [41] fits Lomax exponential model on the same data. The observations in the data set are: 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3,4,4,4,5,5,5,6,6,6,6,8,8,9,15,17,22,23,24,24,25,27,32 and 43.    Table 14 shows that there is a large variation in the data. Furthermore, observations under study are positively skewed and flat towards right. Table 15 provides the estimated values along with standard errors of unknown parameters by using ML method for W-TEXPD and rest of the fitted models. The negative log-likelihood, Akaike information criterion (AIC) and Bayesian information criterion (BIC) are computed to compare the models. The values in Table 15 highlight that the proposed model is statistically better than Weibull, Gamma, Exponential and Truncated Exponential (TEXPD) distributions. Table 16 provides the values of different test statistics which are used to analyze the goodness of fit of the distributions. The distribution having the smallest value of test statistics fits  the best. It is obvious from the values in the Table 16 that the W-TEXPD distribution gives a better fit than the Weibull, Gamma, Exponential and TEXPD distributions. The graphical study reveals the performance of the W-TEXPD by sketching pdf, Q-Q, cdf and P-P plots in terms of the goodness of fit. It is evident from Fig 7 that the observed probabilities plotted against the predicted probabilities are closer and follow the diagonal line. Hence, it is concluded that W-TEXPD is the best choice for modeling the above data.

Concluding remarks
In this research paper, we introduce a new four parameter left truncated distribution called Weibull-Truncated Exponential distribution (W-TEXPD) by employing a new generator. The objective of the present research is to provide a trucated model for finite data. Besides, the additional scale parameter shows a significant impact on the shape of the distribution. A number of distributions are observed as the special cases of the proposed distribution.
It is demonstrated through real life applications that the truncated distribution can quite effectively be used to model a variety of data sets from different fields. It is also concluded that our proposed model better fits the data comprising extreme and/or scattered values as well as skewed (spread) and heavy tailed (flat curved) data. W-TEXPD is effectively applied in engineering and environmental sciences where such type of truncated data are commonly encountered. In future research, we will make a study to compare these estimators for censored data.