Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Min-max approach for comparison of univariate normality tests

Abstract

Comparison of normality tests based on absolute or average powers are bound to give ambiguous results, since these statistics critically depend upon the alternative distribution which cannot be specified. A test which is optimal against a certain type of alternatives may perform poorly against other alternative distributions. Thus, an invariant benchmark is proposed in the recent normality literature by computing Neyman-Pearson tests against each alternative distribution. However, the computational cost of this benchmark is significantly high, therefore, this study proposes an alternative approach for computing the benchmark. The proposed min-max approach reduces the calculation cost in terms of computing and estimating the Neyman-Pearson tests against each alternative distribution. An extensive simulation study is conducted to evaluate the selected normality tests using the proposed methodology. The proposed min-max method produces similar results in comparison with the benchmark based on Neyman-Pearson tests but at a low computational cost.

1. Introduction

Normality of the data is the underlying distributional assumption of multitude of statistical procedures and estimation techniques. In both cross-sectional and time series data, assuming the data normality without testing may affect the accuracy of the econometric inference [1]. Statistical inference from regression models applied to time series [2], categorical [3] and count data [4] depends crucially on the assumption of normal errors. The experimental data sets generated in clinical chemistry for the construction of population reference ranges require the assumption of normality [5]. In short, normality assumption of the given data is the key to validate the inferences made from regression models and other statistical procedures. Diagnostic tests for normality are important as Blanca et al. [6], find only 5.5 percent of the 693 real data distributions close to normality while considering skewness and kurtosis together.

Given the importance of the subject, literature has produced a plethora of goodness-of-fit tests to detect departures from normality [713]. With the development of several normality tests over the decades, power comparison of these statistics has been given the due consideration in literature in search of the best test thus helping the researchers in the choice of suitable normality test [1419]. Different characteristics of normal distribution are exploited while developing normality statistics consequently the power of normality tests varies, depending upon the nature of non-normality [19]. Thus, one normality statistic may perform well for one alternative distribution and another for another alternative non-normal distribution [18]. Comparison of normality tests via simulations are bound to give ambiguous results, since these statistics critically depend upon the alternative distribution which cannot be specified.

This study rests on the finding that one normality test is optimal against one alternative and another for another alternative distribution [20]. The best test’s performance against each alternative distribution provides us the benchmark for comparison of normality tests by using the max-min criterion. Maximum deviations of all selected tests from the benchmark is computed and the test with minimum deviation is ranked as best. This method reduces the calculation burden in terms of computing and estimating the Neyman-Pearson test against each alternative distribution for the benchmark as proposed in [18]. Another problem is that the alternative space is infinite dimensional. Since we plan to use numerical methods, we must narrow this space down to something sufficiently small to permit exploration by numerical methods. At the same time, the space should be large enough to provide a good approximation to the full space of alternatives–failing that, it should be large enough to approximate the distributions conventionally used in simulations studies. First and second order departures from normality depend on the skewness and kurtosis of the distribution, we have used 72 alternatives with wider ranges of these parameters. This alternative space includes mixture of uniform distributions, mixture of t-distributions and the distributions used in the literature [14, 1618, 21].

2. Normality tests

This section deals with the background and technical details of the selected normality tests. Each of these tests belongs to a different class of normality tests e.g. ECDF, moments, regression and correlation based tests etc.

In the following literature review, we consider x1, x2,….,xn as a random sample of size n. Then are the sample mean, variance, skewness and kurtosis respectively, defined as (1) (2)

Where the rth central sample moment is defined as (3)

2.1. Moment tests

2.1.1 The Bowman-Shenton K2-test.

Skewness refers to the symmetry of a distribution and Kurtosis refers to the flatness or ‘peakedness’ of a distribution. These two statistics have been widely used to differentiate between distributions.

The distributions of and b2 have been approximated by using the Pearson curves. D’Agostino [22] and Anscombe and Glynn [23] have found the normalizing transformations for and b2 respectively. and Z2(b2) denote the resulting approximate standardized normal variables. These can be computed by the following algorithm provided in [24].

Computational Algorithm for :

  1. Compute as defined in (1) above
  2. Compute
    Let k4 be the fourth central moment of then
  3. Compute

Computational Algorithm for Z(b2):

  1. Compute from the sample data.
  2. Compute the mean and variance of b2.
  3. Compute the standardized version of b2,
  4. Compute the third standardized moment of b2 [24],
  5. Compute
  6. Compute

D’Agostino and Pearson [8] proposed a test statistic for testing normality that combines and b2 in the following way: where K2 is distributed as chi-square with two degrees of freedom. The normality hypothesis is rejected for large values of the test statistic.

2.1.2 The Jarque-Bera test.

In the field of economics, the most widely used statistics for normality testing is introduced by Jarque and Bera [10, 11]. It is based on the standardized third and fourth moments: where n is the number of observations, and mi is the ith central moment of the observations (i.e. ). Asymptotically, the JB-statistic is distributed as chi-square with two degrees of freedom. The hypothesis of normality is rejected for large values of the test statistic.

2.1.3 The Robust Jarque-Bera test.

Gel and Gastwirth [13] introduced robust measures of sample skewness and kurtosis by utilizing a robust measure of dispersion which is less sensitive to outliers, the average absolute deviation from the sample median, and leads to the following robust JB-statistic. where M is the sample median. The RJB statistic asymptotically follows the Chi-square distribution with two degrees of freedom.

2.1.4 The Bonett-Seier test.

An alternative measure of kurtosis (G-kurtosis) based on Geary’s [25] test for normality is defined by Bonett and Seier [12] as where σ and τ is the population standard deviation and mean absolute deviation respectively. The factor of 13.29 is used to scale it up to 3 so that it matches standard measure of kurtosis and the . To test G-kurtosis = 3, Bonett & Seier [12] used the following statistic

The above Zw-statistic is approximately distributed as standard normal.

2.2. Distance/ECDF tests

This class of tests deals with the comparison of the empirical cumulative distribution function (ECDF), , which is estimated based on data with the cumulative distribution function of normal distribution, Zi. Stephens [26] provided versions of the ECDF tests with unknown mean and variance. ECDF tests can be further classified into those involving either the supremum or the square of the discrepancies, Fn(x(i))−Zi.

ECDF tests involving the square of the discrepancies are known as those from the Cramér-von Mises family.

2.2.1 The Anderson-Darling A2-test.

Anderson-Darling test is, in fact, a modified form of Cramér-von Mises test. It gives more weight to tails of the distribution than does the Cramér-von Mises test. The computational form of the Anderson-Darling statistic is:

A2-test is the most familiar among all ECDF tests. The asymptotic distribution is known and it was found that the critical values for finite samples quickly converge to their asymptotic values for n ≥ 5.

2.2.2 The Kolmogorov-Smirnov test.

In the ECDF class of tests, involving the supremum, a well-known statistic is the Kolmogorov-Smirnov test.

The null hypothesis of normality is rejected for large values of the test statistic.

2.2.3 The Za and Zc tests.

Zhang & Wu [15] proposed two more likelihood ratio statistics of normality testing to the class of EDF tests. The proposed statistics can be defined as follows:

Let X(1), X(2),.. ..., X(n) are the ordered statistics from a continuous random variable X with distribution function F(x) to be used for the following hypothesis testing setup. where F0(x) = ∅(x)−the cumulative distribution function of the normal distribution.

Null is rejected for large values of the test statistics.

2.3. Regression/correlation tests

2.3.1 The Shapiro-Wilk and Shapiro-Francia tests.

Graphically determining the linearity between the ordered observations x(i) and the expected values of the standard normal ordered statistics, mi is known as normal probability plotting. The main idea behind these tests is normal probability plotting. Formally, regression or correlation techniques are used to determine the linearity, hence the name of this group of tests.

The Shapiro and Wilk [27] W statistic is defined as the ratio of two estimates of variance of a normal distribution and can be calculated by

The vector of weights can be computed by where m and V are the mean vector and covariance matrix of the ordered statistics of the standard normal distribution [28]. If the distribution of xi is normal, the W-statistic is close to unity otherwise less the unity. The critical values of W are tabulated up to sample sizes of 50. However, Shapiro and Francia [29] noted that as the sample size increases, the ordered observations tends to be independent (i.e. vij = 0 for i≠j). Treating V as an identity matrix, W can be extended for n larger than 50 by

Values of {mi} are available in [30] up to sample sizes of 400. However, Weisberg and Bingham [31] suggested the following approximation to compute the values of {mi}.

It was shown that the approximation works even for the small samples as there is no significant difference between the null distributions of W and W′ statistics. This simplifies the computation of the test statistics.

2.3.2 The Chen-Shapiro test.

Chen and Shapiro [32] proposed another competitor of Shapiro-Wilk test based upon the normalized spacing which can be defined as where . ∅−1(.) is the inverse of standard normal distribution. Since the authors have shown a close relationship between the Chen-Shapiro (CS) and the Shapiro-Wilk (W) test, it is therefore expected that the performance of the CS test would be comparable with the W test. The normality hypothesis is rejected for small values of the test statistic.

2.3.3 The COIN test.

Coin [33] has proposed a normality test especially for the symmetric non-normal alternatives based on polynomial regression. Let x(i) be a vector of ordered observations drawn from a normal population with unknown mean, μ and variance, σ2 then it is possible to write where μ and σ are the parameters of the best fit line of a normal Q-Q plot and ε is a vector of errors which are assumed to be homoscedastic. The above two parameters may be estimated by using the Least Square method. Instead of using the above model, COIN proposed the following polynomial model where the vector of ordered observations, x(i) has been replaced by z(i) −a vector of ordered standard normal statistics. where βi (i = 1,3) are the fitting parameters and αi represents the expected values of standard normal ordered statistics. The estimated value of β3 significantly different from zero implies that the sample is drawn from a symmetric non-normal distribution. However, Coin suggests the use of as a statistic for testing the null hypothesis of normality. Hypothesis of normality is rejected for large values of the test statistic.

2.3.4 The BCMR test.

Del Barrio, Cuesta-Albertos, Matrán and Rodríguez-Rodríguez [34] proposed the following test statistic for testing normality based on the L2-Wasserstein distance between a sample distribution and the set of normal distributions.

Let x1, x2,… …,xn be a random sample drawn from a distribution with the distribution function F. Let Fn denotes the empirical distribution function, ∅ the distribution function of the standard normal law and s2 the sample variance.

This statistic is asymptotically equivalent to Shapiro-Wilk and Shapiro Francia statistics [34]. The normality hypothesis is rejected for large values of the statistic.

2.4. Other tests

2.4.1 The Gel-Miao-Gastwirth test.

Recently, Gel and Gastwirth [13] have contributed to the literature of directed tests of normality by proposing a statistic which focuses on detecting heavy tails and outliers of symmetric distributions. The test statistic is simply the ratio of standard deviation to the robust measure of dispersion which should tend to unity under normality of data. where M is the median of the sample data. Normality hypothesis is rejected for large values of the statistic. However, the statistic, √n(R-1) is asymptotically distributed as normal with zero mean and standard deviation equal to . The applications of this test can be extended to light tailed distributions as well by using two-sided test for rejecting the null hypothesis of normality.

3. Alternative distributions

As already stated, to permit exploration by numerical methods we must narrow down the infinite dimensional alternative space to a space large enough to approximate the distributions conventionally used in simulations studies. We have used 72 alternative distributions with wide ranges of skewness and kurtosis as the first and second order departures from normality depend on these parameters. The simulation study considers the distributions used in the literature (Table 1), mixture of uniform distributions, and mixture of t-distributions (Table 2). The alternate space includes all kind of (a)symmetric, short- and long-tailed distributions.

The mixtures of t- & uniform distributions are generated by the following rules. (4) (5) where, νi & ηi (i = 1,2) are the degrees of freedom and the means of the respective t-distributions and ai & bi (i = 1,2) are the bounds of uniform distributions.

4. Simulation study

An extensive simulation study is conducted in the following to estimate the size and power of the selected normality tests. First, exact critical values are obtained for samples of size 25, 50 & 75 from normal distribution, at 0.05 percent level of significance on the basis of 100, 000 Monte Carlo simulations in MATLAB R2013a. Second, powers of the fourteen normality statistics are computed against general distributions, mixture of uniform and t- distributions.

As stated earlier, no normality test can be uniformly most powerful against all alternative distributions, one test is optimal for one alternative and another is optional for another alternative. The trajectory of maximum power obtained by any test against each alternative distribution provides us the benchmark against which all tests can be compared. Deviations for each test are computed with reference to the benchmark.

Any function, T(x), which takes values {0. 1} is called hypothesis test. The size of the test is defined as where φ belongs to null space, Φ. The power of the test is the probability of not committing type-II error i.e.,

For any test, maximum achievable power for a given alternative is defined as

For different values of φ, we get different optimal tests statistics. The locus of the powers of these statistics provides us the benchmark. Following loss function is computed to evaluate each normality test in terms of its deviation from the benchmark.

A test with minimum loss or deviation is defined as the best test. The most stringent test will have zero percent loss or deviation from the benchmark. This allows us to rank the normality tests in a unique manner.

5. Results & discussion

Normality tests are evaluated against 72 alternative distributions including mixture of uniform distributions (18 distributions), mixture of t-distributions (20 distributions) and the distributions used in the literature (34 distributions) with wide ranges of skewness and kurtosis. The alternate space includes all kind of (a)symmetric, short (long)-tailed distributions. The most stringent test is the one with minimum deviation from the benchmark.

While evaluating the losses or deviations of normality statistics against the selected alternative space, CS test outperforms the remaining tests for small (n = 25) and medium (n = 50) sample sizes at 5 percent level of significance with 12.3 & 17.3 percent respective deviations from the benchmark (Table 3). These results corroborate with the findings in [17, 18]. Shapiro-Wilk’s W-test is the first ranked statistic for large sample size (n = 75) with 27.4 percent deviation closely followed by CS, Zc, & Za statistics. For third rank, this study recommends BCMR, A2, & Za, tests for small and BCMR for medium and large sample sizes. The JB and RJB tests perform poorly with more than 90 percent losses for all sample sizes which is in line with the findings in [18].

These results clearly indicate that the min-max strategy adopted in this study produces similar results as achieved with Neyman-Pearson benchmark in Islam (2017). Furthermore, the computational cost reduces significantly.

It is interesting to note that symmetric short- and long-tailed alternative distributions are the worst alternatives for both the top ranked statistics, CS and W, in terms of maximum deviations from the benchmark for all sample sizes (Table 4). The W-test also outperforms the A2-test with significant lesser deviations from the benchmark which is in line with the findings in [35].

These two statistics along with the W′, BCMR & COIN tests belongs to the ‘regression & correlation’ class of normality tests. The worst alternatives for the rest of the members of this class are test dependent e.g., the worst alternatives for the COIN tests are skewed and the near normal distributions. On balance, when considering the performance of the regression and correlation-based group of normality statistics, CS is the best test (rank#1) for small and medium sample size closely followed by the W test at rank two position. For large samples, the W test outperforms the CS statistics by a margin of 3.2 percent and occupies the first rank position. These two statistics are closely followed by the BCMR test which is placed at rank three position for all sample sizes. The Shapiro-Francia’s test (W′) shows consistent performance by occupying the fifth position for all sample sizes with maximum deviations ranging from 33 to 50 percent.

Moment based JB & RJB tests perform poorly against the short-tailed symmetric and slightly skewed alternatives (Figs 1 and 2). It is pertinent to mention that the performance of JB & RJB is same at medium sample size (n = 50). Other moment based tests under consideration in this study are K2 & Zw. The K2 statistic is ranked at 6 for small and medium and 8 for large samples sizes with deviations ranging from 60 to 96 percent. The Zw test is ranked at 7th position for small and large and at 8th for medium sample sizes with maximum deviations from the benchmark range from 78 to 88 percent.

thumbnail
Fig 1. Worst alternatives for JB & RJB (n = 25) in terms of deviations.

https://doi.org/10.1371/journal.pone.0255024.g001

thumbnail
Fig 2. Worst alternatives for JB & RJB (n = 75) in terms of deviations.

https://doi.org/10.1371/journal.pone.0255024.g002

Among the normality tests based on empirical cumulative distribution function (ECDF), A2, Za and Zc occupy the third and fourth rank respectively for small samples. The normality tests proposed by Zhang & Wu (2005), Za & Zc, performed well by occupying the second rank with respective maximum losses of 21.0 & 20.5 percent for small and 32.5 & 30.0 percent for large sample sizes. The Anderson-Darling’s statistic (A2) is ranked at fourth position for medium and large sample sizes with 33.4 and 38.5 percent losses. Our results corroborate with the findings in Zhang & Wu (2005). Among the ECDF group of normality tests, the KS and COIN tests did not perform well with above 85 percent deviations from the benchmark for all sample sizes. The COIN test has slight edge to the KS statistic in small sample sizes. For medium and large sample sizes, both share the same ranks with losses above 95 percent.

The R statistic introduced by Gel and Gastwirth [13] for symmetric distributions occupies rank 7 for small and medium sample sizes and rank 6 for large sample sizes. Range of the deviation from the benchmark is 76–85 percent when evaluated against the entire class of alternatives. The worst distributions for the R test belong to asymmetric alternative space for the obvious reasons. Interestingly, the R test occupies sixth and fifth ranks for small and medium to large sample sizes respectively when evaluated against the symmetric alternatives (Table 5). The worst distributions for the R test belongs to symmetric short-tailed alnternative space (Fig 3) for all sample sizes. Therefore, R test is not recommended for symmetric short-tailed alternatives. The COIN test perform relatively much better than the R test which is inline with the findings in [17, 33].

thumbnail
Table 5. Ranking of tests against symmetric alternatives.

https://doi.org/10.1371/journal.pone.0255024.t005

While considering symmetric alternative space, the CS & COIN are the best options for testing normality for small to medium sample sizes, the Shapiro-Wilk’s W test is recommended for large sample sizes (Table 5). The W-test occupies second rank for small and medium sample sizes. The moment-based JB & RJB tests performed poorly against the symmetric class of alternatives as well. The worst distributions for these statistics belongs to symmetric and short-tailed class of alterntaives. These results corroborate with the findings in [16, 35, 36]. The Zw test perfoms relatively well among the moment-based normality tests and occupies fourth rank for all sample sizes with maximum deviation from the benchmark ranging between 23–40 percent. The K2 test is ranked at 7th & 8th positions for small and medium to large samples respectively with losses range of 44–96 percent.

When considering the regression and correlation based group of normality tests, the CS, COIN, W & BCMR are the best options against the symmetric alternatives and occupy top three ranks in the table. Romão et al. [17], recommend the CS & W statistics for asymmetric group of alternatives by comparing the absolute powers. However, when these statistics are evaluated against a benchmark instead of absolute powers, these statistics turn out to be best for symmetric alternative distributions as well. The Shapiro-Francia’s (W′) test does not perform well against symmetric alternaives and occupies ranks 5, 7 & 6 for small, medium and large smaple sizes respectively.

Among the ECDF class of normality tests, A2 & Za occupy the third rank for small samples with 21.5 & 22.0 percent deviations from the benchmark. The Zc & Za are recommended for medium and large sample sizes as Anderson-Darling’s statistic (A2) occupies sixth and fourth positions for medium and large sample sizes respectively. In terms of maximum deviatoins, the Zc has slight edge to Za test for medium and large sample sizes which does not corroborate with the findings in [15]. The KS test does not perform well against the symmetric alternatives with more than 85 percent losses for all sample sizes.

When the selected normality tests are evaluated against the asymmetric class of alternatives, W, Zc, & CS tests occupy the rank one position for small, CS for medium, and CS, W, Zc & Za for large sample sizes (Table 6). On balance, the CS and W tests from regression and correlation based group of normality tests is recommended for all sample sizes whereas the COIN test did not perform well with very high range of deviations from the benchmark when the alternative distribution is drawn from the asymetric distributional space due to obvious reasons. These findings are corroborate with the findings in [17, 18, 35].

thumbnail
Table 6. Ranking of tests against asymmetric alternatives.

https://doi.org/10.1371/journal.pone.0255024.t006

Among the ECDF class of tests, the Zc is ranked as number one statistic for small & large sample sizes and number two for medium sample size against the selected asymmetric distributional space closely followed by Za test. Maximum deviations of these tests range from 5 to 12 percent. TheAnderson-Darling test, A2, is placed at fourth, fifth, and third positions for small, medium, and large sample sizes, respectively with a range of 21–31 percent maximum deviations from the benchmark. Moment-based tests did not perform well with more than 50 percent maximum deviations from the benchmark for all sample sizes against the asymmetric distributional space.

There is no significant difference between the performances of BCMR and W tests of normality in terms of discriminating the long-tailed distributions (β2 > 3). Both the statistics share first rank when evaluated against the selected class of heavy tailed distributional space (Table 7) closely followed by the CS test. The Shapiro-Francia’s W′ test performed well for small samples and occupies third rank with power loss of 15.1 percent however, from regression and correlation class, the COIN test peforms poorly and occupies the last rank with more than 85 percent power losses at all sample sizes. On balance, in terms of maximum deviations from the benchmark, moment-based normality tests do not perform well (Table 7).

thumbnail
Table 7. Ranking of the tests against long-tailed alternatives.

https://doi.org/10.1371/journal.pone.0255024.t007

However, these statistics really performed well when evaluated against the symmetric long-tailed distributions with clear dominance of the RJB test (Table 8). These results are in line with the findings in [18, 33, 36]. Overall, the JB & RJB tests perform well against the long-tailed distributional space except for the alternatives listed in Table 9. Among the ECDF class of normality tests, all the statistics except for KS performed well and are listed among the top four tests for all sample sizes. The R test when evaluated against the thick-tailed alternatives do not perform well and power deviations vary from 66–76 percent (Table 7).

thumbnail
Table 8. Powers of moment-based tests against symmetric long-tailed alternatives.

https://doi.org/10.1371/journal.pone.0255024.t008

thumbnail
Table 9. Worse long-tailed alternatives for JB & RJB (deviations in percentages).

https://doi.org/10.1371/journal.pone.0255024.t009

For distributions from the short-tailed alternative space (β2 < 3), we recommend CS & Zc for small, CS for medium and W test for large sample sizes (Table 10). Romão et al. [17], also recommends the use of CS & W tests for small and large sample sizes. The W-test is also ranked second for small and medium sample sizes with respective maximum deviations of 15.5 & 20.5 percent from the benchmark. Performance of the W test is much better than the KS test irrespective of the fact that the alternative belongs to short- or long-tailed distributional space which corroborates with the findings in [18, 35]. Both the Za & Zc statistics are among the top three positions with Zc having a slight edge to Za against the short-tailed alternatives which is in line with the findings in [18]. Anderson-Darling test statistic (A2) also performs well and occupies third & fourth ranks for small and medium & large samples, respectively. Based on the maximum deviations from the benchmark, BCMR test is placed at rank three with respective power losses of 20.6, 24.3, & 34.0 percent. Among the correlation and regression-based normality tests, the COIN test could not perform well when evaluated separately both for short- and long-tailed alternatives. Performance of the JB, RJB, K2, & Zw is not up to the mark with very high-power deviations which corroborates with the findings in [36].

thumbnail
Table 10. Ranking of the tests against short-tailed alternatives.

https://doi.org/10.1371/journal.pone.0255024.t010

Table 11 presents the top five damaging distributions for each normality test at samples of size 25, 50, & 75. It is evident from the results that ECDF based normality tests suffer more against the symmetric short-tailed and symmetric long-tailed distributions with significant outliers. Symmetric short-tailed and skewed distributions affect the performance of normality tests belong to regression and correlation class. However, the most damaging distributions for the moment based normality tests are specific to individual test in this class. For example, the JB & RJB tests suffer greater power loss against the long-tailed alternatives at small, negatively skewed alternatives at medium and large sample sizes.

thumbnail
Table 11. Top five damaging distributions for normality tests.

https://doi.org/10.1371/journal.pone.0255024.t011

6. Conclusion

Comparison of normality test without having an invariant benchmark has not been proven fruitful in the normality literature. This study proposes an alternative way to compute the benchmark instead of the Neyman-Pearson test-based benchmark proposed in literature. The proposed benchmark is based on the min-max approach which reduces the calculation cost in terms of computing and estimating the Neyman-Pearson tests against each alternative from the selected distributional space. The min-max approach is based on the finding that one test is best against one alternative and another for another alternative [20]. Thus, against each alternative distribution, we get different optimal normality tests. The locus of these statistics provides us the benchmark. Maximum deviations from the benchmark are computed for the selected normality statistics. A test with minimum loss or deviation is defined as the most stringent test. An extensive simulation study is conducted to rank the selected normality tests against a vast distributional space consisting of mixture of uniform distributions, mixture of t-distributions, and distributions used in literature.

General recommendations derived from the analysis of maximum deviations from the benchmark indicate the most stringent normality test is CS for small (n = 25), medium (n = 50), and Shapiro-Wilk’s W-test for large sample size (n = 75) closely followed by CS, Zc, & Za statistics against the entire alternative space. While considering symmetric alternative space, the CS & COIN are the best options for testing normality for small to medium sample sizes, and the Shapiro-Wilk’s W test for large sample sizes. When the selected normality tests are evaluated against the asymmetric class of alternatives, W, Zc, & CS tests occupy the rank one position for small, CS for medium, and CS, W, Zc & Za for large sample sizes.

There is no significant difference between the performances of BCMR and W tests of normality in terms of discriminating the long-tailed distributions (β2 > 3). Both the statistics share first rank when evaluated against the selected class of heavy tailed distributional space closely followed by the CS test. For distributions from the short-tailed alternative space (β2 < 3), we recommend CS & Zc for small, CS for medium and W test for large sample sizes.

References

  1. 1. Costa M, Cavaliere G, Iezzi S. The role of the normal distribution in financial markets. In New Developments in Classification and Data Analysis 2005 (pp. 343–350). Springer, Berlin, Heidelberg.
  2. 2. Giles DE. Spurious regressions with time-series data: further asymptotic results. Communications in statistics—Theory and methods. 2007 Apr 3;36(5):967–79.
  3. 3. Wilde J. A simple representation of the Bera–Jarque–Lee test for probit models. Economics Letters. 2008 Nov 1;101(2):119–21.
  4. 4. Quddus MA. Time series count data models: an empirical application to traffic accidents. Accident Analysis & Prevention. 2008 Sep 1;40(5):1732–41. pmid:18760102
  5. 5. Henderson AR. Testing experimental data for univariate normality. Clinica chimica acta. 2006 Apr 1;366(1–2):112–29. pmid:16388793
  6. 6. Blanca MJ, Arnau J, López-Montiel D, Bono R, Bendayan R. Skewness and kurtosis in real data samples. Methodology. 2013.
  7. 7. Geary RC. Testing for normality. Biometrika. 1947 Dec 1;34(3/4):209–42. pmid:18918691
  8. 8. D’Agostino RA, Pearson ES. Tests for departure from normality. Empirical results for the distributions of b 2 and√ b. Biometrika. 1973 Dec 1;60(3):613–22.
  9. 9. Pearson ES D ‴AGOSTINO RB, Bowman KO. Tests for departure from normality: Comparison of powers. Biometrika. 1977 Aug 1;64(2):231–46.
  10. 10. Jarque CM, Bera AK. Efficient tests for normality, homoscedasticity and serial independence of regression residuals. Economics letters. 1980 Jan 1;6(3):255–9.
  11. 11. Jarque CM, Bera AK. A test for normality of observations and regression residuals. International Statistical Review/Revue Internationale de Statistique. 1987 Aug 1:163–72.
  12. 12. Bonett DG, Seier E. A test of normality with high uniform power. Computational statistics & data analysis. 2002 Sep 28;40(3):435–45.
  13. 13. Gel YR, Gastwirth JL. A robust modification of the Jarque–Bera test of normality. Economics Letters. 2008 Apr 1;99(1):30–2.
  14. 14. Shapiro SS, Wilk MB, Chen HJ. A comparative study of various tests for normality. Journal of the American statistical association. 1968 Dec 1;63(324):1343–72.
  15. 15. Zhang J, Wu Y. Likelihood-ratio tests for normality. Computational statistics & data analysis. 2005 Jun 1;49(3):709–21.
  16. 16. Yazici B, Yolacan S. A comparison of various tests of normality. Journal of Statistical Computation and Simulation. 2007 Feb 1;77(2):175–83.
  17. 17. Romao X, Delgado R, Costa A. An empirical power comparison of univariate goodness-of-fit tests for normality. Journal of Statistical Computation and Simulation. 2009 Mar 17; 80(5):545–591.
  18. 18. Islam TU. Stringency-based ranking of normality tests. Communications in Statistics-Simulation and Computation. 2017 Jan 2;46(1):655–68.
  19. 19. Islam TU. Ranking of normality tests: An appraisal through skewed alternative space. Symmetry. 2019 Jul;11(7):872.
  20. 20. Ul-Islam T. Normality testing-A new direction. International Journal of Business and Social Science. 2011 Jan 1;2(3).
  21. 21. Bispo R, Marques TA, Pestana D. Statistical power of goodness-of-fit tests based on the empirical distribution function for type-I right-censored data. Journal of Statistical Computation and Simulation. 2012 Feb 1;82(2):173–81.
  22. 22. D’Agostino RB. Transformation to normality of the null distribution of g1. Biometrika. 1970 Dec 1:679–81.
  23. 23. Anscombe FJ, Glynn WJ. Distribution of the kurtosis statistic b 2 for normal samples. Biometrika. 1983 Apr 1;70(1):227–34.
  24. 24. D’Agostino RB, Belanger A, D’Agostino RB Jr. A suggestion for using powerful and informative tests of normality. The American Statistician. 1990 Nov 1;44(4):316–21.
  25. 25. Geary RC. The ratio of the mean deviation to the standard deviation as a test of normality. Biometrika. 1935 Oct 1;27(3/4):310–32.
  26. 26. Stephens MA. EDF statistics for goodness of fit and some comparisons. Journal of the American Statistical Association. 1974 Sep 1;69(347):730–7.
  27. 27. Shapiro SS, Wilk MB. An analysis of variance test for normality (complete samples). Biometrika. 1965 Dec 1;52(3/4):591–611.
  28. 28. Royston P. Remark AS R94: A remark on algorithm AS 181: The W-test for normality. Journal of the Royal Statistical Society. Series C (Applied Statistics). 1995 Jan 1;44(4):547–51.
  29. 29. Shapiro SS, Francia RS. An approximate analysis of variance test for normality. Journal of the American Statistical Association. 1972 Mar 1;67(337):215–6.
  30. 30. Harter HL. Expected values of normal order statistics. Biometrika. 1961 Jun 1;48(1–2):151–65.
  31. 31. Weisberg S, Bingham C. An approximate analysis of variance test for non-normality suitable for machine calculation. Technometrics. 1975 Feb 1;17(1):133–4.
  32. 32. Chen L, Shapiro SS. An alernative test for normality based on normalized spacings. Journal of Statistical Computation and Simulation. 1995 Dec 1;53(3–4):269–87.
  33. 33. Coin D. A goodness-of-fit test for normality based on polynomial regression. Computational statistics & data analysis. 2008 Jan 10;52(4):2185–98.
  34. 34. Del Barrio E, Cuesta-Albertos JA, Matrán C, Rodríguez-Rodríguez JM. Tests of goodness of fit based on the L2-Wasserstein distance. Annals of Statistics. 1999 Aug 1:1230–9.
  35. 35. Yap BW, Sim CH. Comparisons of various types of normality tests. Journal of Statistical Computation and Simulation. 2011 Dec 1;81(12):2141–55.
  36. 36. Thadewald T, Büning H. Jarque–Bera test and its competitors for testing normality–a power comparison. Journal of applied statistics. 2007 Jan 1;34(1):87–105.