Beyond the Sin-G family: The transformed Sin-G family

In recent years, the trigonometric families of continuous distributions have found a place of choice in the theory and practice of statistics, with the Sin-G family as leader. In this paper, we provide some contributions to the subject by introducing a flexible extension of the Sin-G family, called the transformed Sin-G family. It is constructed from a new polynomial-trigonometric function presenting a desirable “versatile concave/convex” property, among others. The modelling possibilities of the former Sin-G family are thus multiplied. This potential is also highlighted by a complete theoretical work, showing stochastic ordering results, studying the analytical properties of the main functions, deriving several kinds of moments, and discussing the reliability parameter as well. Then, the applied side of the proposed family is investigated, with numerical results and applications on the related models. In particular, the estimation of the unknown model parameters is performed through the use of the maximum likelihood method. Then, two real life data sets are analyzed by a new extended Weibull model derived to the considered trigonometric mechanism. We show that it performs the best among seven comparable models, illustrating the importance of the findings.


Introduction
Recent advances in probability distribution theory and applications have seen the rise of various general families of distributions, successfully applied for different statistical problems. In this regard, a nice survey can be found in [1]. Here, we put the light on the trigonometric families of continuous distributions, i.e., those defined by a cumulative distribution function (cdf) involving trigonometric functions (sine, cosine, tangent, cotangent, and various combinations of these). The pioneer work is about the Sin-G family developed by [2][3][4][5]. As indicated by its name, it is defined around the sine function; the corresponding cdf is given by where G(x;z) is a baseline cdf of a continuous distribution with parameter(s) vector denoted by z. It is now demonstrated that the Sin-G family has the ability to provide flexible statistical models to fit data of various nature. Also, it is a simple alternative to the model derived to the baseline distribution, without the addition of parameter. For instance, in [2], the exponential distribution is used as a baseline to construct the SinE model, which reveals to suitably fit the famous bladder cancer patients data of [6]. Also, he has the better fit as compared to some classical models such as the former exponential one, having better Akaike information criteria (AIC), Bayesian information criteria (BIC) and Kolmogorov-Smirnov (KS) test values. On the other side, based on the inverse Weibull distribution (see [7]), the SinIW model was introduced by [4], with application to the so-called Guinea pigs data by [8], providing better BIC in comparison to some other solid models. A"free for all" R package on the SinIW model is provided in [9]. As a matter of fact, the qualities of the models derived to the Sin-G family have inspired other general families of continuous distributions also centered around trigonometric functions, such as the Cos-G family by [5], CS-G family by [10], NSin-G family by [11], TransSC-G family by [12], SinTL-G family by [13], SinKum-G family by [14], and SinEOF-G family by [15]. The majority of these families are based on the Sin-G structure, with no additional tuning parameters or transformations.
In this paper, we go further the Sin-G family by proposing a new extended version of it, called the transformed Sin-G (TS-G) family. The corresponding cdf is derived to (1), with the use of a simple one-parameter polynomial-trigonometric transformation. This transformation has the following features: (i) it is analytically simple and includes the non-transformed case, (ii) it has the properties of a continuous cdf, that is, has its values into the unit interval, is continuous, almost everywhere differentiable and increasing, and (iii) it can be convex or concave, or none of them, for well-identified values of the parameter. Thanks to its versatility, this transformation significantly enhances the flexible properties of (1), and the baseline cdf as well. Thus, the TS-G family distinguishes itself from other modified Sin-G families by its overall simplicity, original polynomial-trigonometric functions, and the advantage of flexible kurtosis, skewness, versatile distribution tails, and various hazard rate shapes, as a result of the considered transformation. Thus, the TS-G family can provide interesting models for diverse fitting purposes. This practical aspect, along with important theoretical results, are developed in this study.
The rest of the paper is organized as follows. The basics on the TS-G family are presented in Section 2. Also, an emphasis is put on a special distribution of the family based on the Weibull distribution, motivated by its desirable shapes characteristics in the modelling sense. In Section 3, interesting properties of the TS-G family are studied, including stochastic ordering results, equivalence properties, critical points analysis, series expansion involving known exponentiated functions, moments, and reliability parameter. In Section 4, by adopting a statistical approach, the TS-G model parameters are estimated with the maximum likelihood method, supported by a simulation study. Then, applications of this special model are addressed in Section 5, showing how the new family can be of interest to fit various data sets, outperforming seven other solid extended or modified Weibull models of the literature. Section 6 formulates concluding remarks. Proposition 1 Let λ 2 [0, 1] and T λ (x) be the following parametric function: with T λ (x) = 0 if x < 0 and T λ (x) = 1 if x > 1. Then, the following properties hold: • T λ (x) has the properties of a continuous cdf, (see [16]). Let us now prove the first point of the proposition. Since λ 2 [0, 1], it follows from As a sum of positive functions, we have dT λ (x)/dx � 0, so T λ (x) is increasing. We conclude that T λ (x) has the properties of a continuous cdf. For the second point of the proof, let us notice that, by differentiating on x, we have Therefore, if λ 2 [0, 1/3], it follows from 2λ − 1 � −1/3 and (3) that That is, T λ (x) is concave. On the other hand, if λ 2 [1/2, 1], we have d 2 T λ (x)/dx 2 � 0 as a sum of positive functions, implying that T λ (x) is convex. Now, for λ = 2/5 2 (1/3, 1/2), we have implying that T λ (x) can be neither convex nor concave. As a visual approach, if we set Fig 1 shows that U ℓ (x) can be positive and negative, implying that T l ðxÞ is neither convex nor concave for the considered values of λ. This concludes the proof of Proposition 1. One can remark that the function T λ (x) defined by (2) can be written as ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi 1 À x 2 p ; x 2 ½0; 1�: One can establish that the function T � l ðxÞ has the properties of a cdf, which is not mentioned in the existing literature.
In view of Proposition 1, the transformation function T � l ðxÞ allows to "convexify (or not)" the convex cdf s(x) = sin[(π/2)x], x 2 [0, 1], while keeping its cdf properties. This ability is not satisfied by some other simple transformation functions, as the power transformation, i.e., T �� g ðxÞ ¼ x g with γ > 0, for instance. This aspect is the driving force behind the TS-G family, which aims to expand the Sin-G family in a straightforward manner to open new statistical perspectives. We show the convex/concave properties of the function T λ (x) given by (2)

Definition
By taking the benefits of the flexibility of T λ (x) given by (2) as described in Proposition 1, the proposed TS-G family of continuous distributions is defined by the following cdf: where λ 2 [0, 1] and, as usual, G(x;z) is a baseline cdf of a continuous distribution with parameter(s) vector denoted by z. That is, by considering the transformations T λ (x) and T � l ðxÞ discussed above, we have F(x;λ, z) = T λ [G(x;z)] or, equivalently, Fðx; l; zÞ ¼ T � l f sin ½ðp=2ÞGðx; zÞ�g, motivating the name of "transformed Sin-G family". One can notice that the cdf of the former Sin-G family is derived by taking λ = 0. Also, based on Proposition 1 and the convex/concave properties of T � l ðxÞ, we argue that the overall flexibility of the cdf of the former Sin-G family provided by (1) is enhanced. This is concretized by the addition of the modulating polynomial-cosine term λ(π/ 2)G(x;z)cos[(π/2)G(x;z)], which opens up a whole new world of possibilities.
Also, one can write F(x;λ, z) as a simple mixture of two cdfs of the TS-G family itself: F(x;0, z) and F(x;1, z), with the weights 1 − λ and λ, respectively, i.e., Fðx; l; zÞ ¼ ð1 À lÞFðx; 0; zÞ þ lFðx; 1; zÞ: Hence, the role of λ is to balance F(x;0, z) and F(x;1, z), each reaching different targets in terms of statistical modelling. Among the other functions of interest, the survival function (sf) of the TS-G family is given by Upon an almost everywhere differentiation of F(x;λ, z) with respect to x, the corresponding probability density function (pdf) is given by where g(x;z) is the pdf of the baseline distribution, i.e., obtained by an almost everywhere differentiation of G(x;z). Another important function of the TS-G family, specially when the support of the baseline distribution is (0, + 1), is the hazard rate function (hrf) defined by hðx; l; zÞ ¼ For the importance of the sf and hrf, in reliability analysis mainly, we may refer the reader to [17], and the references therein.

A special distribution: The TSW distribution
Naturally, each choice for G(x;z) gives a new TS-G distribution. Here, we focus our attention on the Weibull distribution as baseline, i.e., defined by the following cdf: where α > 0 and β > 0 are scale and shape parameters, respectively. As a main interest, the Weibull distribution is known to be an alternative to the exponential distribution, offering more flexible hazard rate shapes; decreasing and increasing shapes can be observed. It has been involved with success in a plethora of applications requiring the analysis of lifetime and reliability data. In this regard, we may refer the reader to [18][19][20].
We thus aim to extend the Weibull distribution, along with their properties, via the use of the TS-G family. That is, by inserting (7) into (4), we introduce the TSW distribution defined by the following cdf: where the second expression is obtained after some trigonometric manipulations. Also, the corresponding sf, pdf and hrf are, respectively, given by and f(x;λ, α, β) = 0 if x � 0, and hðx; l; a; bÞ ¼ After some graphical investigations, the curvature properties of the functions of the TSW distribution reveal to be desirably versatile. Evidence can be seen in Fig 3, which displays some plots of the corresponding pdf and hrf for various values of the parameters.
In particular, Fig 3(a) indicates that the pdf of the TSW distribution has various skewness shapes (near symmetrical, left, right, bathtub, reversed-J shapes, mainly), along with different kurtosis properties. Fig 3(b) reveals that the corresponding hrf possesses versatile shapes, such as decreasing, increasing, bathtub (classic and upside-down) and reversed-J shapes. These observations imply that the TSW distribution is adequate to fit heterogeneous data sets. In our study, this aspect will be developed in Section 5, where the TSW distribution is used to fit two real life data sets. Also, it will be compared with other extended or modified Weibull models, and the results will be quite favorable to the TSW model.

Notable mathematical properties
Here, we explore some mathematical properties of interest satisfied by the TS-G family.

Stochastic ordering results
Stochastic ordering results are crucial to understand a certain hierarchy existing between the distributions, with consequence on their comparison from the modelling point of view. In the framework of the TS-G family, the following result presents some relations involving the cdf of the TS-G family (beyond the following immediate stochastic ordering property: F(x;λ, z)�F (x;0, z)).

Proposition 2
The following inequalities hold: Proof. Based on (4), since λ 2 � λ 1 and the involved functions are positive, we have implying the desired inequality. For the second point, the following inequality holds: for y 2 [0, π/2], we have sin(y)�y(2/π) (see [16]). Hence, based on (4) Then, one can remark that F�(x;z) is the cdf of the rv Z = max(X, Y), where X is a rv having the (baseline) cdf G(x;z) and Y is a rv having the cdf of the Cos-G family (see [5]), with X and Y independent.
The following result is about a likelihood stochastic ordering of the TS-G family. We refer the reader to [21] for the details on the concept of likelihood stochastic order.
Proposition 3 Let X 1 be a rv having the cdf F(x;λ 1 , z) and X 2 be a rv having the cdf F(x;λ 2 , z). Then, if λ 2 � λ 1 , we have X 1 � X 2 in the likelihood stochastic ordering sense.

Equivalence properties
Here, some equivalence properties of crucial functions of the TS-G family are discussed, which can be helpful to find their limits and also, understand the tails properties of the distribution. As G(x;z)!0, we establish that In each case, we see how the new parameter λ modulates the limits; it has a strong effect in this regard, except for the hrf when G(x;z)!1.

A series expansion
The following result establishes a new representation of the pdf of the TS-G family involving exponentiated baseline pdfs. Such results are common for the pdfs of modern general families of continuous distributions (see, e.g., [4,11,22]).
Proof. Owing to the series expansions of the sine and cosine functions, after some developments, we get Fðx; l; zÞ ¼ sin We end the proof of Proposition 4 by differentiating the above function with respect to x. Proposition 4 is of interest because the properties of most of the exponentiated standard distributions are well known, and thus, can be used to determine those of the TS-G family. Also, from the practical point of view, it allows us to define some integral terms by the means of (infinite) sums, which sometimes give less error than compute the integral directly. In this regard, we refer to the discussion in [22].

Generalities on the moments
Let X be a rv having the cdf F(x;α, β, z) given by (4) (and the pdf f(x;α, β, z) given by (5)) and ϕ (x) be a function. Then, assuming that it makes mathematical sense, the expectation of ϕ(X) is obtained as These two integrals can be determined analytically, depending on the complexity of the function ϕ[Q G (u;z)]. In all the situations, for given baseline cdf and λ, Θ ϕ (X) can be calculated by the means of numerical techniques, implemented in any mathematical software.
Also, for an alternative analytical treatment, Proposition 4 implies that For practical purposes, the sum can be truncated to a large enough integer K, providing a suitable approximation of Θ ϕ (X). Some derivations of Θ ϕ (X) are presented in Table 1, which follow from several specific choices of ϕ(x). As an example of application, the m th raw moments of a rv X following the TSW distribution can be derived from (8) and the m th raw moments of the exponentiated Weibull distribution with power parameter 2k + 1 as established in [26].

Reliability parameter
The general definition of the reliability parameter can be formulated as follows. Let X 1 and X 2 be two continuous rvs that can be compared based on a scenario that makes sense in a random system. Then, the corresponding reliability parameter can be defined as where f(x, y;ξ) denotes the joint pdf of (X 1 , X 2 ), with ξ as parameter(s) vector. Details and applications of R in a concrete setting can be found in [27,28], and the references therein.
The following result concerns the expression of R for the TS-G family in a specific setting. Proposition 5 Let X 1 and X 2 be two independent rvs having the cdfs F(x;λ 1 , z) and F(x;λ 2 , z), respectively. Then, we have Proof. Owing to the independence of X 1 and X 2 , and (4) and (5), and after some integral calculus, we arrive at This ends the proof of Proposition 5. In Proposition 5, when X 1 and X 2 are identically distributed, i.e., λ 1 = λ 2 , we get R = 1/2. Also, Proposition 5 is useful to have a simple estimate of R based on estimates of λ 1 and λ 2 .
Indeed, ifl 1 andl 2 are estimates of λ 1 and λ 2 , respectively, then the plugging approach suggests the following estimate for R: However, more research into the application of this formula to real-world data is needed.

Maximum likelihood estimation
Here, an inferential study of the TS-G family is proposed, estimating the parameters of the TS-G model by the maximum likelihood method.

The basics
The maximum likelihood method is commonly employed in parametric estimation because of its overall simplicity and the theoretical guarantees ensuring strong convergence properties on the obtained estimates. In this regard, the reader will find everything in [29]. We may also refer to [30][31][32] for modern applications of this method. In the context of the TS-G family, the 0:3; a ¼ 3; b ¼ 5Þ and S 2 : ðl ¼ 0:1; a ¼ 3:5; b ¼ 4:5Þ. We also calculate the empirical mean squared errors (MSEs) of the MLEs defined as, for h = λ, α, β, where the index i refers to the i th generated samples. The results of this simulation study are presented in Figs 4 and 5 for S 1 and S 2 , respectively.

PLOS ONE
As a prime observation, we see that, in all the situations, when the sample size increases, the empirical MSEs approach the axis y = 0. This illustrates the "numerical convergence" of the MLEs to the true values of the parameters.

Applications
Thanks to its desirable flexible properties, the TSW model aims to be applied in concrete scenarios, such as the fit of real life data. We share this finding by considered the two following well-referenced real life data sets.

PLOS ONE
The transformed Sin-G family "The first data set". The first data set finds its source in [28]. It [3].
As criteria of goodness-of-fits to compare these models, we chose the Cramér-Von Mises (CVM), Anderson-Darling (AD) and KS statistics, with the corresponding KS p-values. Also, the AIC is calculated. For the use of the AIC in applied frameworks, one may refer to [39][40][41]. The global rule is the following ones. The smaller the values of the CVM, AD, KS statistics and AIC, and the larger the values of the KS p-values, the better the fit of the corresponding model to the considered data. The R software is used.  Tables 2 and 3 list the values of the CVM, AD, KS with p-value, and the MLEs and their corresponding SEs of the models parameters for the first and second data sets, respectively. Tables 2 and 3 indicate that the smallest CVM, AD and KS and the largest KS p-value are for the TSW model; it is the best model with the considered criteria. In particular, it  outperforms the former SW model corresponding to λ = 0. That is, we see that the parameter λ of the TSW model is estimated "far from zero", i.e., the corresponding MLEs arel ¼ 0:7896 andl ¼ 0:7440, for the first and second data set, respectively. This points out the importance of the transformed sine technique to obtain suitable fits of these data, in comparison to the former SW model. Tables 4 and 5 present the minus estimated log-likelihood, i.e., À' ¼ À 'ðl;â;bÞ for the TSW model, and AIC values of the model parameters for the first and second data set, respectively.
According to Tables 4 and 5, since it has the lowest AIC for the two data sets, the TSW model can be considered as the best one.
We now provide a graphical visualization of the nice fitting results of the TSW model. That is, Figs 6 and 7 display several fits of the TSW model. In particular, the histograms of the both data sets are plotted, along with the curves of the corresponding estimated pdfs, i.e., f ðx;l;â;bÞ, the curves of the estimated cdfs, i.e., Fðx;l;â;bÞ, are plotted over the ones of the corresponding empirical cdfs of the data, the curves of the estimated sfs, i.e., Sðx;l;â;bÞ, are plotted over the curves of corresponding empirical sfs of the data, and Probability-Probability (P-P) plots are provided.
In all the graphics, we see that the red curves fit well the corresponding black curves, attesting the efficiency of the TSW model in this data fitting exercise.

Concluding remarks
Based on a new one-parameter transformation function, we provide an original extension of the Sin-G family of continuous distributions, introducing the transformed Sin-G (TS-G) family. We discuss how an additional parameter λ can enhance the flexibility of the cdf of the former Sin-G family, with nice consequences for modelling purposes. An emphasis is put on the transformed Sin Weibull (TSW) distribution, showing a high potential in the analysis and modelling of lifetime data. Some general mathematical features of the TS-G family are established. Then, a statistical approach is adopted; the maximum likelihood estimates (MLEs) for the TS-G model parameters are discussed. The TSW model is highlighted, demonstrating that it is more capable of fitting data than seven rival models, some of which have more parameters. The TS-G family can find a broader use in all areas dealing with modern data as a result of its qualities. For example, it can be used to construct models in multivariate analysis, regression, classification, and other statistical fields of importance. In addition, the transformation T λ (x) or T � l ðxÞ can be used to efficiently extend other existing families of distributions. These viewpoints necessitate additional developments, which we plan to incorporate in future works.