Modeling traumatic brain injury lifetime data: Improved estimators for the Generalized Gamma distribution under small samples

In this paper, from the practical point of view, we focus on modeling traumatic brain injury data considering different stages of hospitalization, related to patients’ survival rates following traumatic brain injury caused by traffic accidents. From the statistical point of view, the primary objective is related to overcoming the limited number of traumatic brain injury patients available for studying by considering different estimation methods to obtain improved estimators of the model parameters, which can be recommended to be used in the presence of small samples. To have a general methodology, at least in principle, we consider the very flexible Generalized Gamma distribution. We compare various estimation methods using extensive numerical simulations. The results reveal that the penalized maximum likelihood estimators have the smallest mean square errors and biases, proving to be the most efficient method among the investigated ones, mainly to be used in the presence of small samples. The Simulated Annealing technique is used to avoid numerical problems during the optimization process, as well as the need for good initial values. Overall, we considered an amount of three real data sets related to traumatic brain injury caused by traffic accidents to demonstrate that the Generalized Gamma distribution is a simple alternative to be used in this type of applications for different occurrence rates and risks, and in the presence of small samples.


Introduction
Gamma distribution plays an important role in statistics as one of the most used generalizations of the Exponential distribution due to its various special cases (such as Exponential and Chi-square). This distribution has been used in different scenarios, such as reliability engineering, environmental modeling, and health research, to list a few (see Louzada and Ramos [1] and the references therein). Stacy [2] proposed an important generalization of the Gamma a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 distribution to unify other relevant distributions, e.g., Weibull and Lognormal. This generalization, called Generalized Gamma (GG) distribution, has been successfully applied in diverse areas, such as reliability, data processing, and meteorology, among others (see Cox et al. [3] and the references therein). Moreover, it keeps the characteristic of incorporating only the support of a positive random variable T, and its probability density function (pdf) is given by f ðtj�; m; aÞ ¼ a Gð�Þ m a� t a�À 1 exp fÀ ðmtÞ a g; ð1Þ where t > 0, Gð�Þ ¼ R 1 0 e À t t �À 1 dt is the gamma function, α > 0 and ϕ > 0 are the shape parameters and μ > 0 is the scale parameter. The GG distribution includes various sub-models as special cases, such as the Log-Normal, Weibull, Gamma, Half-Normal, Nakagami-m, Rayleigh, Maxwell-Boltzmann, and Chi distributions.
Frequentist inference for the GG distribution has been widely considered in the literature. Stacy and Mihram [4] derived the maximum likelihood estimators (MLEs). However, Harger and Bain [5] later showed that the nonlinear equations obtained by the maximum likelihood approach are unstable. DiCiccio [6] discussed approximate conditional inference methods for this distribution. Huang and Hwang [7] used the method of moments to perform inference for the GG distribution. Furthermore, Khodabin and Ahmadabadi [8] compared the method of moment estimators and MLEs, whose results revealed that most of the time the MLEs showed greater performance even though under the presence of estimation limitations. Recently, Noufaily and Jones [9] discussed some different approaches to maximize the likelihood function; the proposed numerical technique returned smaller proportions of errors during the maximization process, but still failed in a significant number of samples, which is undesirable.

Overview of the TBI problematic
Trauma is a multisystem health condition that represents the third cause of death worldwide, surpassed only by cerebrovascular diseases and cancer [10]. It is estimated that over sixty million people have trauma each year, and nearly 16,000 people die every day after some traumatic injury. Traumatic Brain Injury (TBI) represents one of the significant causes of death and disability among the trauma epidemiology data [10,11]. Most of the patients with TBI are young, economically active adults and more likely to have been involved in a traffic accident [12][13][14][15][16]. Therefore, TBI is considered a public health concern that leads to high costs of hospitalization with various economic and social burdens.
The limited available data in the TBI problematic, usually presented in a small number of patients, motivated the current study, which relied on a data set of a longitudinal observational investigation of patients after TBI due to traffic accidents admitted to a Brazilian Emergency Department. Investigating optimal statistical analyses in this population is essential for providing impactful information applied not only to patients but also to their families, caregivers, and society in general [17].

Current statistical methods and their limitations
In the literature, there are various classical methods for estimating the unknown parameters of probability distributions. Under the frequentist approach, the primary interest is to compare the maximum likelihood estimation method with other estimation procedures. Related studies about different distributions have also been presented in the literature [18][19][20][21][22]. In this work, we consider several of these estimation procedures, such as least squares, the weighted least squares, the maximum product of spacings, and the Anderson-Darling maximum goodnessof-fit estimators.
An intensive simulation study is conducted to compare these estimation methods. However, we observe two significant problems. The first is that, for some methods, the estimation procedures fail in finding the target estimates, i.e., report convergence problems in the maximization/minimization process. The second is related to the occurrence of a significant bias in the obtained estimates for small samples. In order to overcome this problem, we propose a penalized maximum likelihood estimator, which, combined with a very useful practical algorithm called Simulated Annealing (SANN) (for further details, see Kirkpatrick et al. [23]), guarantees the best convergence not depending on the conditions of the problem (i.e., the initial values), even when it has several local extrema. Prentice [24] argued that the approximately normal distribution for ϕ, using the maximum likelihood theory, may not be achieved even for sample sizes equal to or larger than 400. Due to the asymptotic relationship of the maximum likelihood estimator with the penalized maximum likelihood estimator, the same problem may be observed. Therefore, we consider a bootstrap approach to building accurate confidence intervals (see DiCiccio and Efron [25]) for small and moderate samples. Finally, by combining all these approaches, one can perform inference for the flexible GG distribution with good precision even for small sample sizes.
The paper is organized as follows. Section 2 presents some properties of the GG distribution, including its cumulative distribution, survival and hazard functions, and its moments. Additionally, the SANN algorithm is also discussed in detail, and implementation procedures are presented. Section 3 discusses the eight estimation methods considered in this paper. Section 4 shows a simulation study, using synthetic data, designed to identify the most efficient estimation procedure. In Section 5, we apply our proposed methodology to three new real data sets provided by a medical school, which contain the TBI patients' lifetime risk among different hospitalization stages. Finally, some final comments are given in Section 6.

Background
Let T be a random variable with GG distribution, i.e. T~GG (ϕ, μ, α). Then, its cumulative distribution function (cdf) is given by where gðy; xÞ ¼ R x 0 w yÀ 1 e À w dw is called lower incomplete gamma function. The survival function is given by where Gðy; xÞ ¼ Glaser [26] showed that the hazard function (4) of the GG distribution can capture basic shapes, such as constant, increasing, decreasing, bathtub and unimodal. (initial temperature). The SANN algorithm can be described in a formal and general way as follows.
2. From the current solution, x i , generate a potential solution, x j , according to a specific generation scheme.
4. Update the value of the control parameter, k i , and set i = i + 1. Then, go to step 2.
Steps 2-4 are repeated, e.g., until the value of the control parameter, k i , is sufficiently small, or the same solution is repeatedly generated in many successive iterations.
Next, some useful remarks about the above algorithm are given.
2. In step 2, the potential solutions, x j 's, are randomly chosen within a range. For instance, through x j = x j + rV, where r is a uniformly distributed random number in the interval (−1, 1), i.e. r~U(−1, 1), and V is a vector (of length d) of step sizes. After s iterations, the SANN adjusts its search bounds for each variable so that 50% of all moves will be accepted, either enlarging these bounds to select a new ground to move to or shrinking them to a minimum.
3. In step 3, for the cases where Δg > 0, we generate u~U(0, 1) and move to x j only if u < p.
Thus, accepting worse solutions may prevent the process from becoming stuck at local minima.
4. In step 4, after m iterations, temperature (control parameter) k drops as k 0 = r k × k, where 0 � r k � 1 is the rate of temperature reduction given the initial annealing/cooling schedule. Usually, r k = 0.95.
5. As pointed out by Salter and Pearl [28], by the Markov chain theory, the SANN algorithm can be shown to converge to a stationary distribution for which the set of optimal solutions has probability 1, under certain conditions of both the sequence of control parameters and the generation scheme (see, e.g., Aarts and Korst [29], Haario and Saksman [30]).
Although presented above as a minimization problem, the SANN algorithm can be easily modified/adapted to cases where the interest resides on maximizing the function g(�).
Several variants of the SANN algorithm have been proposed in the recent literature. Among them, we can mention the relevant works of Torres-Jimenez and Rodriguez-Tello [31], Torres-Jimenez et al. [32], and Izquierdo-Marquez et al. [33].

Inference
In this section, we present different frequentist estimation methods to obtain the estimates for the parameters ϕ, μ, and α of the GG distribution.

Common estimators
The method of moments (MM) is one of the oldest procedures used for estimating parameters in statistical models. It is still widely used mainly because of its simplicity. For instance, for the two-parameter Gamma distribution, MM estimators have closed-form expressions.
Huang and Hwang [7] derived the MM estimators for the GG distribution, pointing out the need for solving two nonlinear equations, to find such estimators. Let t 1 , . . ., t n be a random sample of size n from T * GG(ϕ, μ, α). The moments estimators are obtained by solving where � t ¼ 1 n P n i¼1 t i and s ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi are the sample mean and standard deviation, respectively; while the estimate of ϕ can be obtained bŷ wherem MM andâ MM are obtained by solving the nonlinear equations in (7). The MM estimators of all GG model parameters do not have closed-form expressions, which is undesirable. Another disadvantage of this approach is that the authors did not discuss the asymptotic properties of the MM estimators. Therefore, no interval estimates for ϕ, μ and α can be constructed without further research. Another common procedure is to consider the ordinary least squares (OLS) estimators. The� OLS ,m OLS andâ OLS estimates can be obtained by minimizing, with respect to ϕ, μ and α, the following equation: where t (1) � t (2) � � � � � t (n) are the order statistics of a random sample of size n. Equivalently, these estimates can be obtained by solving the nonlinear equations: Note that the solution of Δ 1 (t (i) |ϕ, μ, α) involves a non-trivial partial derivative of the lower incomplete gamma function. However, this can be easily achieved numerically with high precision.
The weighted least squares (WLS) estimators,� WLS ,m WLS andâ WLS , can be obtained by minimizing with respect to ϕ, μ and α. These estimates can also be obtained by solving the nonlinear equations: for j = 1, 2, 3.

Maximum likelihood estimators
The maximum likelihood (ML) estimation method is widely used for the GG distribution due to the invariance and asymptotic properties of the obtained estimators. Let t = (t 1 ,. . ., t n ) 0 be a random sample of size n from a GG(ϕ, μ, α) population. Then, the likelihood function of (1) is given by The log-likelihood function of (14) is given by 'ð�; m; ajtÞ ¼ n log ðaÞ À n log ðGð�ÞÞ þ na� log ðmÞ þ ða� À 1Þ By solving the expressions: @ @� 'ð�; m; ajtÞ ¼ 0, @ @m 'ð�; m; ajtÞ ¼ 0 and @ @a 'ð�; m; ajtÞ ¼ 0, the following nonlinear equations can be obtained, respectively: n a þ n� log ðmÞ þ� where cðkÞ ¼ G 0 ðkÞ GðkÞ is the digamma function. The solutions of the above Eq (17) yield the MLEs. After some algebraic manipulations, we havê and the MLE of α is obtained by solving the nonlinear equation: and c 0 ðkÞ ¼ @ @k cðkÞ is the trigamma function.

Penalized maximum likelihood estimators
Firth [34] proved that the bias of MLEs can be reduced by considering a penalization in the likelihood function. Moreover, the author showed that in exponential families with canonical parameterization, the first-order term is removed by using the Jeffreys prior [35] as a penalization term. The Jeffreys prior for the GG distribution is computed by |I(ϕ, μ, α)| 1/2 , where |�| stands for the determinant of the Fisher information matrix (23), which results in ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi As previously stated, the first-order term related to the bias is removed in the case of distributions that belong to the exponential family. On the other hand, the GG distribution is not a member of the exponential family. However, this penalization also allows us to improve the estimates, even not ensuring that the improvement is of the first order. Note that when ϕ = 1, the GG distribution reduces to the Weibull distribution, for which the Jeffreys prior is given by π J (μ, α) / (μα) −1 . The extra α −1 helps us to decrease the bias of α; therefore, since π J (ϕ, μ, α) is not a function of α, we consider the following penalization: ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi The penalized likelihood function of ϕ, μ and α, using the Jeffreys prior (25), is given by ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi The log-likelihood function of (26) is given by By solving the expressions: @ @� ' P ð�; m; ajtÞ ¼ 0, @ @m ' P ð�; m; ajtÞ ¼ 0 and @ @a ' P ð�; m; ajtÞ ¼ 0, the following nonlinear equations can be obtained, respectively: n À 1 a þ n� log ðmÞ þ� Note that one of the parameters can be isolated, in order to obtain two nonlinear equations. The three possible expressions are given bŷ From Eqs (31)-(33), we observe that, considering (32), the penalized maximum likelihood (PML) estimators are achieved with less computational effort. Therefore, we will consider the nonlinear Eqs (28) and (30), where� is obtained from (32). Although Firth [34] proved that the PML estimators obtained from the penalized likelihood or log-likelihood function in the exponential family of distributions are always finite and, in addition, always exist, the same cannot be done for the GG distribution. For this model, the MLEs can have no solution or several solutions (see Wingo [36]). This problem is observed computationally, since it is very complex to prove analytically. It is also complicated to demonstrate analytically the results for the PML estimators. Note that our main goals here are to propose a method to circumvent these computation difficulties, and achieve improved estimates for the parameters.
It is important to point out that it is not simple to check the regularity conditions necessary to ensure asymptotically normal distribution (see Lehman [37], Theorem 5.1, page 463). In fact, Prentice [24] showed that the approximate normal distribution for ϕ, using the ML theory, could not be achieved even for sample sizes equal to or larger than 400. This result can also be extended to the PML theory, since the MLEs and PML estimators are asymptotically equivalent. In order to overcome this problem, for small sample sizes, we considered the bootstrap approach presented by DiCiccio and Efron [25] to construct improved confidence intervals based on the PML estimates.

Maximum product of spacings estimators
As an alternative to the ML estimation method, the maximum product of spacings (MPS) is a robust method for estimating the unknown parameters of continuous univariate distributions. Cheng and Amin [38,39] introduced this method, which was independently developed by Ranneby [40] as an approximation to the Kullback-Leibler information measure. Moreover, Cheng and Amin [39] proved some desirable properties of the MPS estimators, such as asymptotic efficiency, invariance and, more importantly, the consistency of these estimators holds under more general conditions than for MLEs.

Anderson-Darling estimators
Here, we present one type of minimum distance estimators (also referred to as the maximum goodness-of-fit estimators), which is based on the Anderson-Darling statistic and, due to this, is known as the Anderson-Darling (AD) estimator. The AD estimates,� AD ,m AD andâ AD , of the GG model parameters ϕ, μ and α, are obtained by minimizing, with respect to ϕ, μ and α, the function Að�; m; aÞ ¼ À n À 1 n X n i¼1 2i À 1 ð Þ ½ log ðFðt ðiÞ j�; m; aÞÞ þ log ðSðt ðnþ1À iÞ j�; m; aÞÞ �: ð45Þ These estimates can also be obtained by solving the nonlinear equations: Turning now to a modified version of the AD statistic, the right-tail Anderson-Darling (RAD) estimates,� RAD ;m RAD andâ RAD , of the parameters ϕ, μ and α, are obtained by minimizing the function Fðt ðiÞ j�; m; aÞ À 1 n with respect to ϕ, μ and α. These estimates can also be obtained by solving the nonlinear equations: where Δ j (�|ϕ, μ, α), j = 1, 2, 3, are given in (11).

Simulation
In this section, we show the results of a simulation study carried out to compare the efficiency of the different frequentist methods used for estimating the three parameters of the GG distribution. Considering the proposed estimators, the following procedure was adopted: With this approach, the most efficient estimation method returns both bias and MSE closer to zero. The simulations were conducted using the R software [41]. For numerical optimization, we used the SANN algorithm, which was described in Section 2.1. Finally, the chosen values of the simulation parameters were: N = 20, 000, n = {20, 30, . . ., 300} and θ = {(0.5, 0.5, 3), (0.4, 1.5, 4)}. It is important to point out that the results of this simulation study were similar for other choices of θ. Since in real applications it is difficult to obtain good initial values, we assumed that the initial values are random and were generated from a uniform distribution on the interval (0, 4). Therefore, we also expect to obtain good estimates regardless of the initial values.
The estimation procedures needed to be performed under the same conditions to make the comparison meaningful. However, for some particular samples and estimation methods, the numerical techniques failed in finding the parameter estimates. Thus, we present the observed proportion of failures/errors of each method in Tables 1 and 2. As can be seen in these tables, there are high proportions of failures in the optimization process to find the estimates for the MM, OLS, and WLS methods, even for moderate sample sizes. Therefore, such estimation procedures should be avoided when estimating the parameters of the GG distribution. These methods were removed to avoid the inclusion of bias in the results to continue the simulation study. Hereafter, we consider the PML, ML, MPS, AD, and RAD estimators. Figs 2 and 3 present the bias and MSE of the estimates of ϕ, μ and α.
The horizontal lines in these figures correspond to bias and MSE equal to zero. In Figs 2 and 3, we observe that both the bias and MSE for all estimators tend to zero as n increases, i.e., the estimators are asymptotically unbiased and consistent for the parameters. The PML method returned improved estimates for the GG distribution when compared with the ML method. Moreover, the SANN algorithm allowed us to successfully find the estimates regardless of the initial values used for starting the optimization process. In this case, under the PML  method, all the generated samples returned satisfactory estimates, even for small sample sizes. Therefore, combining all simulation results with the useful properties of the PML estimators, such as asymptotic efficiency, normality, consistency, and invariance, we conclude that the PML estimators should be used for estimating the parameters of the GG distribution.

Applications
We applied the proposed statistical methods to a data set of male patients admitted to the Emergency Department of the Ribeirão Preto Medical School, University of São Paulo, Brazil, diagnosed with TBI due to car accidents (excluding patients who where less than 18 years old, or other neurological conditions). Only patients that were admitted and discharged alive were considered in this study. Thus, we did not consider censored data. We considered the length of stay in hospital in the survival function. Our main aim here was to check the average time that a patient stays in hospital, given that some time had already passed. For example, if a patient has been in hospital for ten days, how much longer do we expect him/her to take to be discharged? This problem is discussed in this section.
The TBI data set was firstly studied by Tavares [42]. The data acquisition period was from May 2004 to June 2005, and the male patients were analyzed using three different lifetime variables: the total amount of time (in days) spent in hospital (hereafter, data set D1); the amount of time at the Neurology Inpatient Department (data set D2); and during which time patients used Mechanical Ventilation (data set D3).
We compared the results obtained using the GG distribution with the corresponding ones achieved with the usage of other three-parameter lifetime distributions. The Generalized Weibull (GW) distribution (Mudholkar et al. [43]), with pdf given by f ðtjl; a; sÞ ¼ ðasÞ where l 2 R and α > 0 are the shape parameters and σ > 0 is the scale parameter. The Exponentiated Weibull (EW) distribution (Mudholkar et al. [44]), with pdf f ðtjy; a; sÞ ¼ ay s t s where θ > 0 and α > 0 are the shape parameters and σ > 0 is the scale parameter. The Marshall-Olkin Weibull (MOW) distribution (Marshall and Olkin [45]), with pdf given by where α > 0 and β > 0 are the shape parameters and λ > 0 is the scale parameter. Finally, the Extended Poisson-Weibull (EPW) distribution (Ramos et al. [46]), whose pdf is f ðtjl; a; bÞ ¼ albt aÀ 1 e À bt a À le À bt a 1 À e À l ; ð52Þ where l 2 R � and α > 0 are the shape parameters and β > 0 is the scale parameter.
The goodness-of-fit of the models was checked using the Kolmogorov-Smirnov (KS) test, which is based on the KS statistic: D n = sup|F n (t) − F(t|θ)|, where sup is the supremum of the set of distances, F n (t) is the empirical cdf and F(t|θ) is the cdf of the reference distribution. The KS hypothesis testing was conducted at the 5% level of significance, to reveal whether or not the data came from F(t|θ). In this case, the null hypothesis (i.e. the data came from F(t|θ)) is rejected if the returned p-value is smaller than 0.05. To carry out the model selection, the following discrimination criteria were adopted: AIC (Akaike Information Criterion) (Akaike [47]) and AICc (Corrected Akaike Information Criterion) (Sugiura [48]; Hurvich and Tsai [49]), which are computed by AIC ¼ À 2'ðθjtÞ þ 2c and AICc = AIC + 2 c (c + 1)/(n − c − 1), where c is the number of model parameters andθ is the estimate of θ. Given a set of candidate models for the data at hand, the best fitted model is the one that presents the minimum values of these criteria. Furthermore, in order to distinguish between two almost equally well-fitting models, Burnham and Anderson [50], page 70, give a rough rule of thumb for comparing AICs (as well as AIC variations, including AICc), based on the AIC differences, Δ w = AIC w − AIC min , where AIC w denotes the AIC value of the candidate model w and AIC min is the minimum of the AIC values. Thus, models with Δ w < 2 are all plausible, i.e. they have substantial support and should receive consideration in making inferences; models with 4 < Δ w < 7 have considerably less support; and finally, models with Δ w > 10 have either essentially no support and might be omitted from further consideration, or at least fail to explain some substantial explainable variation in the data. Table 3 shows some summary statistics for these lifetime variables/data sets. According to all statistics, patients would spend less time, in days, on the Mechanical Ventilation than in the Neurology Inpatient Department.
As can be seen in Table 3, we have a small number of observations for the different data sets, which constitutes a situation where the ML method may return high-biased estimates. However, such a problem can be easily overcome by considering the PML estimators, as shown in Section 4. In TBI, specifically, the patient follow-up is time-consuming and requires dedication in order to acquire the data. Thus, small data sets may be recurring and solutions should be provided. Consequently, some models were fitted, and according to the model selection criteria, the best-adjusted one was indicated. Table 4 provides the AIC and AICc values, as well as the p-values obtained from the KS test, for all five distributions (GG, GW, EW, MOW, and EPW) fitted using the PML estimation method. Observe that, for all three data sets, both criteria provide empirical evidence in favor of the GG distribution. However, the difference between the AIC (AICc) value for the EW model and the AIC (AICc) value for the GG model ("best" model according to both criteria) is less than two. Therefore, we can also consider the EW model as a plausible one for describing all the data sets. Followed up by Fig 4, which presents the fitted survival functions superimposed to the empirical survival function, it can be observed that the GG distribution gives a good fit to all data sets. Obtained results support elements towards the development of a decision-making system, using ad hoc evidence, generating its associated probabilistic function, which helps the expert to infer patients' risks. In addition to the point parameter estimates, we computed the confidence intervals, as well as the mean residual lifetime for the GG model parameters.
To construct such confidence intervals, one can use the asymptotic properties of the PML estimators. However, for the considered data sets, we have sample sizes smaller than 20. Prentice [24] showed that the approximate normal distribution for ϕ, using the ML theory, could not be achieved even for sample sizes equal to or larger than 400. Therefore, we considered a bootstrap approach to build such intervals (see DiCiccio and Efron [25]). It is essential to point out that the obtained bootstrap interval relies on a replication of small samples and the estimation of the related parameters, therefore the proposed approach that does not fail in finding such estimates plays an important role also in computing intervals. The PML estimates and the 95% bootstrap confidence intervals (CI) for the parameters ϕ, μ and α of the GG distribution, for all three data sets, are given in Table 5.  Following the interpretation related to Table 5, data set D1 showed a higher estimate for the parameter ϕ than the others, where the parameters μ and α behave in the opposite way. It is worth mentioning that it is hard to obtain a biological interpretation for the parameters since they influence the higher moments (e.g., the mean and variance) of the distribution simultaneously. Moreover, the obtained bootstrap confidence intervals returned accurate evidence even considering small sample sizes.
From the proposed methodology, the PML estimates were obtained with a satisfactory goodness of fit. With the adjusted parameters, one can solve the problem related to the expected time that will be taken for a patient to be discharged. In order to achieve that, we consider the mean residual lifetime of the GG distribution, which is given by rðtj�; m; aÞ ¼ 1 Sðtj�; m; aÞ Fig 5 shows the mean residual lifetime, considering the three data sets related to patients' TBI caused by a traffic accident. The plotted curves return the conditional expectation (r(t)) given the patient's spent time (t).
From this graph, we can see different expected times given that the patient has been in hospital. For instance, considering the patients from data set D1, given that one patient has been in hospital for ten days, we expect that he/she may be discharged after seventeen more days. On the other hand, if the patient is from data set D3 and has been hospitalized for ten days, then we expect that he/she will leave hospital after four more days.

Conclusions
In this paper, we derived and compared, through an extensive simulation study, different frequentist estimation methods for the parameters of the GG distribution. From our simulations, we observed that the OLS, WLS, and MM methods failed in finding the parameter estimates for a significant number of samples. On the other hand, considering the SANN algorithm with the PML estimation method, we were able to find the solutions (i.e., the parameter estimates) for all samples regardless of the initial values used for initiating the iterative procedure. Moreover, the PML method provided better estimates for all three parameters regardless of the sample size. Thus, the PML method is the most efficient estimation procedure, among the ones considered in this study, and should be used for all practical purposes.
Finally, our proposed methodology was illustrated in three real data sets related to patients' traumatic brain injury caused by a traffic accident, demonstrating that the GG distribution is a simple alternative to be used in such applications for different occurrence rates and risks, even under the presence of small samples.
Supporting information S1 File. Data are available downloading this file. (XLS)