A note on the Wilcoxon-Mann-Whitney test and tied observations

Markus Neuhäuser; Graeme D. Ruxton

doi:10.1371/journal.pone.0309074

Abstract

Recently, it was recommended to omit tied observations before applying the two-sample Wilcoxon-Mann-Whitney test McGee M. et al. (2018). Using a simulation study, we argue for exact tests using all the data (including tied values) as a preferable approach. Exact tests, with tied observations included guarantee the type I error rate with a better exploitation of the significance level and a larger power than the corresponding tests after the omission of tied observations. The omission of ties can produce a considerable change in the shape of the sample, and so can violate underlying test assumptions. Thus, on both theoretical and practical grounds, the recommendation to omit tied values cannot be supported, relative to analysing the whole data set in the same way whether or not ties occur, preferably with an exact permutation test.

Citation: Neuhäuser M, Ruxton GD (2024) A note on the Wilcoxon-Mann-Whitney test and tied observations. PLoS ONE 19(8): e0309074. https://doi.org/10.1371/journal.pone.0309074

Editor: Benjamin Jerry Ridenhour, University of Idaho, UNITED STATES OF AMERICA

Received: October 25, 2023; Accepted: July 19, 2024; Published: August 21, 2024

Copyright: © 2024 Neuhäuser, Ruxton. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the manuscript and its Supporting information files. To be precise, we used simulated data only. Our simulation code is provided, together with the used seed value, and some comments to enhance understanding of our R code. Thus all simulations can easily be reproduced.

Funding: The author(s) received no specific funding for this work.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Data points with identical values are called ties, they frequently occur. Recently, it was proposed to omit tied observations before applying the two-sample Wilcoxon-Mann-Whitney test [1]. Here, we argue against this recommendation.

Tied observations do not create insuperable problems for application of the Wilcoxon-Mann-Whitney test. It is true that the normal approximation of the Wilcoxon rank-sum can be poor when there are ties, although this depends on the number and pattern of ties [2]. However, with the advent of modern computing power, there is no need to use the normal approximation in order to apply this test. Indeed, even in the absence of ties, when sample sizes are small or moderate an exact permutation version of the test is preferable to the one that relies on asymptotic normality.

A permutation test is not only possible to calculate “the null distribution of the possible difference in means (or medians, or ratios, etc.)” [1], but also for obtaining the null distribution of the Wilcoxon rank-sum [3]. Neuhäuser [4] as well as Brunner at al. [2] explicitly demonstrate how the exact permutation test can be applied in the presence of ties. In the case of large sample sizes, an approximate permutation test can be applied based on a random sample of permutations (see e.g. [4]). Thus, there is no theoretical or practical need for omitting tied observations.

As noted by McGee [1] observations are often discretized or rounded, e.g. due to detection limits of the measurement instruments, even when the underlying distributions are continuous. In those cases, however, tied observations are not evenly distributed across samples (as in McGee’s simulation study). In contrast, ties are more likely where the density of the distribution in higher. To omit tied values can thus produce a considerable change in the shape of distribution of the remaining data, compared to the whole data set. Therefore, solely because of this possible change of the distribution one should retain tied observations for the statistical analysis. This holds not only for the Wilcoxon-Mann-Whitney test, but also for other methods such as the t test. A fundamental assumption of statistical testing is that the sample is representative of the underlying population from which it is drawn. If ties are non-randomly distributed (as we argue they should generally be expected to be) then changing the sample by omitting tied values breaks this assumption, and the reduced sample is no longer representative of the underlying population that was originally of interest.

Moreover, as ties are non-randomly distributed in practice, the possible change in the shape of distribution of the remaining data after the omission of ties might lead to changes between groups in skewness, variance and other characteristics that can violate the shift alternative framework of the statistical testing approach.

In addition, when clinical trial data are analysed, the omission of tied observation would violate the intention-to-treat ideal of including all randomised subjects in the analysis. In this context, note that Mao [5] recently investigated the Wilcoxon-Mann-Whitney test in the presence of noncompliance. The noncompliers, included in an intention-to-treat analysis, are not an evenly distributed subset, and Mao [5] showed that the properties of the Wilcoxon-Mann-Whitney test depend on whether noncompliers are more likely in high-density regions or in the tails of the outcome distribution.

Furthermore, when rounding is carried out, the amount of rounding, i.e. the number of remaining decimal places, might be arbitrary. If rounding is necessary, we suggested that the “number of decimal places should be selected by the scientist on the basis of their understanding of the precision of measurements involved” [6, p. 298].

Materials and methods

Like McGee [1], we performed a simulation study (using R). Again like McGee [1], we investigate the four distributions: normal (with variance 1), exponential (with rate 1), Cauchy, and Laplace (both with scale parameter 1). No changes in variability are considered, possible differences between the two groups are only location shifts. We investigate the Wilcoxon-Mann-Whitney test both based on complete samples with rounded values (and using mid-ranks), and after omission of tied observations. However, in contrast to McGee [1], tied observations are not evenly distributed across samples in our simulation, we round the simulated values to one or two decimal places, respectively, in order to simulate the creation of tied values by limited measurement precision. In contrast, in McGee’s simulation a given number of values is randomly selected, regarded as ties and omitted (see e.g. the supplementary file SmallSamplePowerOmit.txt, available at https://github.com/MonnieMcGee/TiesInRankBasedTests/blob/master/R-Code/SmallSamplePowerOmit.txt). Thus, in McGee’s simulations the percentage of tied observations was kept constant in the different scenarios. In our simulations, the number of ties varies, which we believe better reflects the situation in statistical practice.

We present results for balanced and unbalanced sample sizes, for the exact permutation test and the asymptotic test, as well as for two different significance levels, 0.01 and 0.05. All results are based on 10000 simulation runs. Sometimes omission of tied observations caused one of the two groups to be empty. In this case no test could be performed. The estimated actual type I error rate or power, respectively, is estimated as the number of tests for which the p-value is less than the nominal level of significance divided by the number of performed tests. When rounding to two decimal places, empty groups did not occur. When rounding to one decimal place, empty groups occurred 0 to 207 times (with mean 21.4, and median 5) in 10000 simulations.

Results and conclusion

The simulation results are displayed in Tables 1 to 4.

Download:

Table 1. Actual type I error and power for balanced sample sizes (n₁ = n₂ = 10), simulated observations rounded to 2 decimal places.

https://doi.org/10.1371/journal.pone.0309074.t001

Download:

Table 2. Actual type I error and power for balanced sample sizes (n₁ = n₂ = 10), simulated observations rounded to 1 decimal place.

https://doi.org/10.1371/journal.pone.0309074.t002

Download:

Table 3. Actual type I error and power for unbalanced sample sizes (n₁ = 14, n₂ = 7), simulated observations rounded to 2 decimal places.

https://doi.org/10.1371/journal.pone.0309074.t003

Download:

Table 4. Actual type I error and power for unbalanced sample sizes (n₁ = 14, n₂ = 7), simulated observations rounded to 1 decimal place.

https://doi.org/10.1371/journal.pone.0309074.t004

Although the tests after omission of tied observations do not have an inflated type I error rate, the tests with tied observations included have an actual size closer to the nominal significance level, especially when considering a significance level of 1%, and/or rounding to one decimal place. In general, the exact permutation test with tied observations included is a good choice in all the situations we investigate, the type I error is guaranteed, and the power is larger, often much larger, than the power of the corresponding tests after the omission of tied observations. With the advent of modern computing power, asymptotic tests provide no advantage to exact tests, but rely on assumptions about the data that will not always be met and often cannot be easily tested. In summary, the tests after the omission of tied observations show a lower exploitation of the significance level and a lower power. Hence, for realistic scenarios, they cannot be recommended with respect to type I error and power.

McGee [1] reported a more erratic, or worse, type I error rate and a reduced power when tied observations occur with a percentage 25% or 50% and are omitted. On the basis of those results, one might advocate omitting ties only when the percentage of ties in the sample is less than 15%. However, such a strategy would often preclude defining the statistical approach ahead of data collection, since the percentage of ties would normally be difficult to predict with any confidence. Further, the exact test with tied observations included has preferable type I error and power values than the test performed after omission of ties in the scenarios presented in Tables 1 and 3 where the average proportion of ties is not larger than 10%. Thus, the exact permutation test is always a good choice. This test is available in various statistical software packages including the open-source statistical software R. In addition, a free and easy-to-use online calculator that provides the exact permutation Wilcoxon-Mann-Whitney test is available at https://ccb-compute2.cs.uni-saarland.de/wtest/ [7].

When applying this permutation test there is, even in the case of discrete or ordinal data, no need, and no reason, to replace the test statistic. Thus, the permutation test should be performed with the Wilcoxon rank sum, not with the difference in means, medians, or ratios. The exact permutation Wilcoxon-Mann-Whitney test can be applied even in the extreme case of binary data: when two groups are compared based on binary data the standard method is Fisher’s exact test; this test can be considered as a special case of the exact version of the Wilcoxon-Mann-Whitney test; both tests result in identical p-values [2, p. 106].

It should be noted that McGee [1] also investigated other methods of handling ties. Her detailed study shows some advantages and disadvantages of the various procedures. McGee [1] concluded that, aside from omission of tied observations, the next best option would be jittering the data, i.e. adding random noise to the observations to break ties. However, an exact permutation test with mid-ranks is preferable to using randomly broken ties [4, 6, 8].

To conclude, we see no reason to recommend the omission of tied observations when applying the Wilcoxon-Mann-Whitney test, and good reason not to recommend such omission, even when the number of ties is low. Exact testing of the whole dataset regardless of ties is a better option.

Supporting information

S1 File. R code for simulation.

https://doi.org/10.1371/journal.pone.0309074.s001

(PDF)

References

1. McGee M. Case for omitting tied observations in the two-sample t-test and the Wilcoxon-Mann-Whitney test. PLOS One 2018 13(7), e0200837. pmid:30040850
2. Brunner E, Bathke AC, Konietschke F. Rank and pseudo-rank procedures for independent observations in factorial designs. Cham: Springer; 2019.
3. Lehmann EL. Nonparametrics: Statistical methods based on ranks. New York: Springer (revised first edition); 2006.
4. Neuhäuser M. Nonparametric statistical tests: A computational approach. Boca Raton: CRC Press; 2012.
5. Mao L. On the relative efficiency of the intent-to-treat Wilcoxon–Mann–Whitney test in the presence of noncompliance. Biometrika 2022; 109: 873–880. pmid:36035896
6. Neuhäuser M, Ruxton GD. Round your numbers in rank tests: exact and asymptotic inference and ties. Behavioral Ecology and Sociobiology 2009; 64: 297–303.
- View Article
- Google Scholar
7. Marx A, Backes C, Meese E, Lenhof HP, Keller A. EDISON-WMW: Exact dynamic programing solution of the Wilcoxon-Mann-Whitney test. Genomics Proteomics Bioinformatics 2016; 14: 55–61. pmid:26829645
8. Tilquin P, van Keilegom I, Coppieters W, le Boulenge E, Baret PV. Non-parametric interval mapping in half-sib designs: use of midranks to account for ties. Genetical Research 2003; 81: 221–228. pmid:12929913

[ref1] 1. McGee M. Case for omitting tied observations in the two-sample t-test and the Wilcoxon-Mann-Whitney test. PLOS One 2018 13(7), e0200837. pmid:30040850
View Article
PubMed/NCBI
Google Scholar

[2] View Article

[3] PubMed/NCBI

[4] Google Scholar

[ref2] 2. Brunner E, Bathke AC, Konietschke F. Rank and pseudo-rank procedures for independent observations in factorial designs. Cham: Springer; 2019.

[ref3] 3. Lehmann EL. Nonparametrics: Statistical methods based on ranks. New York: Springer (revised first edition); 2006.

[ref4] 4. Neuhäuser M. Nonparametric statistical tests: A computational approach. Boca Raton: CRC Press; 2012.

[ref5] 5. Mao L. On the relative efficiency of the intent-to-treat Wilcoxon–Mann–Whitney test in the presence of noncompliance. Biometrika 2022; 109: 873–880. pmid:36035896
View Article
PubMed/NCBI
Google Scholar

[9] View Article

[10] PubMed/NCBI

[11] Google Scholar

[ref6] 6. Neuhäuser M, Ruxton GD. Round your numbers in rank tests: exact and asymptotic inference and ties. Behavioral Ecology and Sociobiology 2009; 64: 297–303.
View Article
Google Scholar

[13] View Article

[14] Google Scholar

[ref7] 7. Marx A, Backes C, Meese E, Lenhof HP, Keller A. EDISON-WMW: Exact dynamic programing solution of the Wilcoxon-Mann-Whitney test. Genomics Proteomics Bioinformatics 2016; 14: 55–61. pmid:26829645
View Article
PubMed/NCBI
Google Scholar

[16] View Article

[17] PubMed/NCBI

[18] Google Scholar

[ref8] 8. Tilquin P, van Keilegom I, Coppieters W, le Boulenge E, Baret PV. Non-parametric interval mapping in half-sib designs: use of midranks to account for ties. Genetical Research 2003; 81: 221–228. pmid:12929913
View Article
PubMed/NCBI
Google Scholar

[20] View Article

[21] PubMed/NCBI

[22] Google Scholar

Figures

Abstract

Introduction

Materials and methods

Results and conclusion

Supporting information

S1 File. R code for simulation.

References