Robust multivariate nonparametric tests for detection of two-sample location shift in clinical trials

Xuejun Jiang; Xu Guo; Ning Zhang; Bo Wang; Bo Zhang

doi:10.1371/journal.pone.0195894

Abstract

This article presents and investigates performance of a series of robust multivariate nonparametric tests for detection of location shift between two multivariate samples in randomized controlled trials. The tests are built upon robust estimators of distribution locations (medians, Hodges-Lehmann estimators, and an extended U statistic) with both unscaled and scaled versions. The nonparametric tests are robust to outliers and do not assume that the two samples are drawn from multivariate normal distributions. Bootstrap and permutation approaches are introduced for determining the p-values of the proposed test statistics. Simulation studies are conducted and numerical results are reported to examine performance of the proposed statistical tests. The numerical results demonstrate that the robust multivariate nonparametric tests constructed from the Hodges-Lehmann estimators are more efficient than those based on medians and the extended U statistic. The permutation approach can provide a more stringent control of Type I error and is generally more powerful than the bootstrap procedure. The proposed robust nonparametric tests are applied to detect multivariate distributional difference between the intervention and control groups in the Thai Healthy Choices study and examine the intervention effect of a four-session motivational interviewing-based intervention developed in the study to reduce risk behaviors among youth living with HIV.

Citation: Jiang X, Guo X, Zhang N, Wang B, Zhang B (2018) Robust multivariate nonparametric tests for detection of two-sample location shift in clinical trials. PLoS ONE 13(4): e0195894. https://doi.org/10.1371/journal.pone.0195894

Editor: Qizhai Li, University of the Chinese Academy of Sciences, CHINA

Received: January 29, 2018; Accepted: April 2, 2018; Published: April 19, 2018

Copyright: © 2018 Jiang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the paper and its Supporting Information files.

Funding: Dr. Xuejun Jiang’s research was partially supported by Natural Science Foundation of China (11101432), Natural Science Foundation of Guangdong Province of China (2017A030313012), and Shenzhen Sci-Tech Fund (JCYJ20170307110329106). Dr. Xu Guo’s research was partially supported by National Natural Science Foundation of China (11601227 and 11626130) and Natural Science Foundation of Jiangsu Province, China (BK20150732). The research project was partially funded by the National Institute of Mental Health (R34MH077523).

Competing interests: The authors have declared that no competing interests exist.

Introduction

In randomized controlled trials, effectiveness (or efficacy) of a treatment effect is constantly characterized by the difference between distributional locations of a treatment group and its control group. Hypothesis testing is the primary statistical inference approach in examining treatment effects in clinical trials, when it is conducted to detect whether there exists any difference between distributional locations of the treatment group and the control group. When the primary endpoint is one-dimensional and normally distributed for both study groups, the two-sample t test is the standard tool. Yet, the two-sample t test may not be valid when normality assumption is violated. The two-sample t test is not robust to outliers and heavy-tail distributions. A number of robust nonparametric tests have been developed in the literature as a complement of the two-sample t test. The classic Wilcoxon-Mann-Whitney test [1] that used the rank sum is a nonparametric counterpart of the two-sample t test. Yuen [2] and Keselman et al. [3] recommended to construct the tests using trimmed means. Recently, Fried and Dehling [4] proposed a series of robust nonparametric tests for detecting univariate two-sample location difference. These tests were constructed based upon unscaled and properly scaled robust location estimators of distributions, including medians and Hodges-Lehmann estimators. The numerical studies reported by Fried and Dehling [4] showed that the test statistics were robust to outliers and non-normality and efficient in detecting univariate two-sample location shift. Mathur [5] proposed a strictly nonparametric bivariate test constructed from an extended U statistic and concluded that the test statistic did not depend on the covariance structure of the underlying population and was more powerful than the existing tests.

In randomized controlled trials, effectiveness of a treatment effect can be defined by not a single or two, but multiple primary endpoints, and significance of the treatment effect is then determined by multivariate location shift between the two multivariate distributions of the treatment and control groups. In these clinical trials, multivariate hypothesis testing procedures are needed to detect a potential location shift between two samples that are drawn with a multivariate primary endpoint. The conventional univariate two-sample t test was extended to the multivariate setting by Hotelling [6] and the proposed test statistic was denominated Hotelling’s T² statistic. The Hotelling’s T² test inherits the limitations of univariate two-sample t test, because it is still not robust to multivariate outliers and not valid when the multivariate normality assumption is violated. This motivated the development of multivariate two-sample location tests. Hettmansperger and Oja [7] developed a multivariate sign test for detecting location deviation among multiple multivariate samples. Hettmansperger et al. [8] introduced affine invariant analogues of the two-sample Mann-Whitney-Wilcoxon rank sum test. Neuhaus and Zhu [9] proposed multivariate distribution-free permutation test statistics that were built upon projected univariate versions of multivariate data. Henze et al. [10] introduced a class of consistent tests, in which the test statistic is a weighted integral of the squared modulus of the difference of the empirical characteristic functions of one multivariate sample and another multivariate sample plus a location shift.

In this article, we extend the robust nonparametric test statistics proposed by Fried and Dehling [4] and Mathur [5] to the multivariate setting. A series of robust multivariate nonparametric tests are proposed using the component-wise medians, Hodges-Lehmann location estimators, and an extended U statistic. Univariate test statistics for detecting multivariate two-sample location shift are constructed for the robust multivariate nonparametric tests as (i) unscaled maximum of the component-wise medians or the Hodges-Lehmann estimators, (ii) scaled maximum of the component-wise medians or the Hodges-Lehmann estimators, (iii) maximum of the scaled component-wise medians or the Hodges-Lehmann estimators, or (iv) the extended U statistic. A bootstrap approach and a permutation approach are introduced for determining the p-values of the proposed test statistics. We conduct simulation studies to examine performance of the proposed robust nonparametric test statistics in detecting multivariate two-sample location shift. The numerical results given by the bootstrap procedure demonstrate that the proposed robust multivariate nonparametric tests constructed from the Hodges-Lehmann estimators are more efficient than those based on medians and the extended U statistic. The permutation approach can provide a more stringent control of Type I error and is more powerful than the bootstrap procedure. To demonstrate the use of these proposed robust multivariate nonparametric tests, the proposed hypothesis tests are applied to detecting the intervention effect of the Thai Healthy Choices study [11], a study that promotes a four-session motivational interviewing-based intervention to reduce risk behaviors among youth living with HIV (the Thai Healthy Choices study was designed jointly by Wayne State University and the Thai Red Cross AIDS Research Center, and implemented in Bangkok, Thailand).

The scientific contribution of this article is multifold. First, a series of new robust nonparametric test statistics are proposed for detecting location shift between two multivariate samples collected from the treatment and control groups, respectively, in clinical trials. Both a bootstrap approach and a permutation approach are introduced to implement the proposed tests for seeking corresponding p-values. These provide the practitioners a variety of choices with two distinct implementation approaches to test on treatment effects when the normality assumption for the samples is violated. Second, comprehensive numerical studies are conducted and the results show explicit benefits from using the proposed tests over the Hotelling’s T² and the extended U tests in term of controlling Type I error and boosting statistical power. Third, the article presents a representative example, the Thai Healthy Choices study, and shows how the proposed robust nonparametric hypothesis testing procedures can be implemented to test on the treatment or intervention effect in a clinical trial.

Tests on two-sample location shift

Consider two random samples {X₁, ⋯, X_m} and {Y₁, ⋯, Y_n} with p-dimensional independent multivariate observations , i = 1, ⋯, m, and , j = 1, ⋯, n, for which and The null hypothesis of equality of F(x) and G(x) and its alternative hypothesis that there is a location shift in the two multivariate distributions are (1) A natural idea to test the above hypotheses is to compare location estimators of the two distributions. Sample means and can be used to fulfil this mission, which leads to the prominent Hotelling’s T² test. However, the Hotelling’s T² test is constructed under the multivariate normal distributions, and therefore perform poorly when there are outliers or the underlying true distributions of the two samples do not follow multivariate normal distributions.

Tests based on unscaled median difference and Hodges-Lehmann estimators

Here, we propose a series of robust nonparametric test statistics based on robust estimators for distribution locations as competitors of the Hotelling’s T² test statistics. A general approach to construct such nonparametric tests is to estimate the location difference Δ and then reject the null hypothesis H₀ if Δ is far from zero. As usual, we can replace difference of sample means with difference between two sample medians: (2) In (2), med{X₁, ⋯, X_m} and med{Y₁, ⋯, Y_n} are the p-dimensional median vectors of the two samples. The median vector of a sample is defined as the vector of component-wise medians. That is, the kth component of med{X₁, ⋯, X_m} is the median of X_1k, ⋯, X_mk, where X_ik is the kth component of p-dimensional observation X_i for i = 1, ⋯, m, and the kth component of med{Y₁, ⋯, Y_n} is the median of Y_1k, ⋯, Y_nk, where Y_jk is the kth component of p-dimensional observation Y_j for j = 1, ⋯, n. In practice, however, cannot be directly used as a test statistic for the pair of hypotheses in (1), since is a p-dimensional vector and is not a scalar. Therefore, the following maximum of absolute values of the p medians within can be considered: where is the kth component of for k = 1, ⋯, p. Under the null hypothesis should be close to zero, whereas under alternative hypothesis it deviates from zero.

Noted that, although the direct sample medians med{X₁, ⋯, X_m} and med{Y₁, ⋯, Y_n} in (2) are robust estimators for the locations of two samples, these medians are not very efficient as each of them exploits little information in the sample data. To balance efficiency against robustness, two types of Hodges-Lehmann estimators were developed [12, 13]. Multivariate analogs of the univariate Hodges-Lehmann estimators are and where the p-dimensional multivariate median vectors are likewise defined as in (2). The test statistics using the multivariate Hodges-Lehmann estimators to detect location shift between two multivariate samples are then proposed as the absolute component-wise maximum of and : and where and are the kth component of and , respectively, for k = 1, ⋯, p.

Tests based on scaled median difference and Hodges-Lehmann estimators

Because , and only measure the component-wise maximum variability between the two samples, a scaled version of each is required to construct a more robust nonparametric test statistic. To this end, a related measure of the variability within the two samples are needed for the procedure of standardization. For , the following p-dimensional median vector is the measure of component-wise differences between the two samples: where is the joint median-corrected sample and and . Then, the absolute component-wise maximum of S₁ is defined as and the standardized version of can be formulated as For and , the following p-dimensional median vectors of the absolute set of differences in the samples and within the joint median-corrected sample can be taken as the measure of component-wise differences within the two samples: and Then, the absolute component-wise maximum of them are defined as and respectively. The scaled versions of and are consequently constructed as and

Alternative standardization procedure of , , and can also be applied, which is to standardize each component of them and then take the maximum of all standardized components. This alternative standardization procedure leads to the following test statistics for detecting a multivariate two-sample location shift: and for l = 1, 2, ⋯, 5, in which S_lk denotes the kth element of S_l, l = 1, 2.3. Under the null hypothesis, the proposed test statistics T_l and , l = 1, 2, ⋯, 5, should be close to zero, whereas they should be far from zero under the alternative hypothesis. When the dimension p is equal to 1, these test statistics degenerate to the test statistics introduced by Fried and Dehling [4] and for l = 1, 2, ⋯, 5.

Tests based on U statistics

An U-statistic recently proposed by Mathur [5] was designated to test a bivariate two-sample location shift. Here, we extend it to serve for detecting the multivariate two-sample location shift. Specifically, the extended U test statistic for multivariate sample location detection is defined as (3) where is the Euclidean distance from {X₁, ⋯, X_m} to origin and is the Euclidean distance from {Y₁, ⋯, Y_n} to origin. The null hypothesis is rejected if the observed value of the extended U statistic exceeds a critical value of U obtained by permutation.

Implementation: A bootstrap procedure

Here, a bootstrap procedure is introduce to numerically approximate the p-values of the proposed robust nonparametric tests. Suppose two random samples with p-dimensional independent multivariate observations and p-dimensional independent multivariate observations are collected. The null hypothesis H₀: F(x) = G(x) and its alternative hypothesis for a location shift in the two multivariate distributions H₁: F(x) = G(x + Δ) are considered. To conduct hypothesis testing on such a pair of hypotheses, distributions of the above proposed test statistics are generally unknown in finite samples. Therefore, a bootstrap method can be adopted to approximate the underlying distribution of a test statistic and subsequently determine the corresponding p-value. In the bootstrap procedure, a pseudo sample is drawn from the pooled sample {X_i, Y_j; i = 1, ⋯, m, j = 1, ⋯, n} with replacement, and another pseudo sample is drawn from the same pooled sample also with replacement. Let V denote any one of the investigated test statistics, and let V* be the bootstrap version of V that is calculated from the paired bootstrap pseudo samples and . Then, the null hypothesis is rejected if V is larger than the (100% ⋅ α) quantile of the bootstrap distribution of V*, where α is the level of significance of hypothesis testing. It has been confirmed in the literature that the above bootstrap procedure can produce a valid approximation to the test statistic V [14–16].

Implementation: A permutation procedure

A permutation procedure is a competitive alternative to the bootstrap procedure to derive critical values for the proposed robust nonparametric tests. In the permutation procedure, the pooled sample {X_i, Y_j; i = 1, ⋯, m, j = 1, ⋯, n} is repeatedly split to two pseudo samples and without replacement. Let V* denoted the permuted version of a investigated test statistic V, and V* is calculated from the paired permuted pseudo samples and . Then, the null hypothesis is rejected if V is larger than the (100% ⋅ α) quantile of the permutation distribution of V*, where α is the level of significance of hypothesis testing.

Simulation studies

This section reports numerical results from a simulation study that was conducted to demonstrate merits of the proposed hypothesis tests and compare them with Hotelling’s T². In this simulation study, we aim at examining and comparing performance of the proposed hypothesis tests for detecting a location shift among different pairs of two samples. The sample {X₁, ⋯, X_m} was generated from F(x) and the sample {Y₁, ⋯, Y_n} was generated from G(x). Four different pairs of distributions of F(x) and G(x) were considered: (i) F(x) was a p-dimensional multivariate normal distribution N_p(1_p, Σ_p), where 1_p is a p-dimensional vector with each component equal to one and Σ_p is the variance-covariance matrix, and G(x) was the location shift distribution N_p(1_p + Δ, Σ_p), (ii) F(x) was a p-dimensional multivariate t distribution with 1 degrees of freedom t₁(1_p, Σ_p), and G(x) was the location shift distribution t₁(1_p + Δ, Σ_p) (iii) F(x) was a p-dimensional multivariate t distribution with 3 degrees of freedom t₃(1_p, Σ_p), and G(x) was the location shift distribution t₃(1_p + Δ, Σ_p), and (iv) F(x) was the p-dimensional joint distribution of the diagonal elements of a Wishart random matrix that followed the Wishart distribution W_p(3, Σ_p), where 3 is the degree of freedom and Σ_p is the scale matrix, and G(x) = F(x + Δ) was the location shift distribution. In this simulation study, two variance-covariance matrices were taken to generated the simulation data: one is an independent variance-covariance matrix I_p×p, which is a p × p identify matrix, and another one is a non-independent variance-covariance matrix with diagonal elements equal to 1 and non-diagonal elements equal to 0.5. The dimension p of the two samples was set to be 4, and the sample sizes were set as n = m = 10, 25, and 50. In this simulation study, the location vector Δ in each of four distributions was specified as Δ = (0.5δ, δ, δ, 2δ)′, in which δ varied to take a value of 0, 0.5, 1, 1.5, or 2. The proposed test statistics , , , T_l and , l = 1, 2, ⋯, 5, as well as Hotelling’s T² and the extended U statistic, were applied in two-sample multivariate hypothesis testing to detect the location shift. A total number of 1000 simulation data sets were generated from each pair of specified distributions of the two samples, and then the proposed hypothesis testing was implemented using these investigated test statistics. The rejection rate was subsequently calculated as the frequency that the null hypothesis H₀: F(x) = G(x) was rejected among the 1000 simulation data sets by each of the investigated hypothesis test statistics. When δ = 0, the pair of true distributions of the two samples have the identical location, and thus the rejection rate is corresponding to simulated Type I error of the hypothesis tests. When δ ≠ 0, the pair of true distributions of the two samples reside in different locations, and the rejection rate is corresponding to simulated power of the hypothesis tests. In this simulation study, the number of bootstrap samples was set to be 500 and the significance level was set to be 0.05.

The simulation results of the test statistics based on the bootstrap procedure are presented in S1–S6 Tables. S1–S3 Tables report the Type I errors and power obtained from the simulated paired samples that were generated using the independent variance-covariance matrix with different sample sizes. S4–S6 Tables report the Type I errors and power obtained using the non-independent variance-covariance matrix. It is observed that, when the samples were generated from two multivariate normal distributions with a location shift, Hotelling’s T², extended U statistic, and T_l and , l = 2, 3, performed the best among all the investigated test statistics in term of Type I errors and power as δ varied. There was not sufficient numerical evidence that the Hotelling’s T² statistic outperformed other five statistics. The tests based on the Hodges-Lehmann estimators , , T_l and , l = 2, 3, 4, 5, were more powerful than those based on medians , T₁ and . The choice of the measure of variability within the two samples (i.e., the choice of either S₂ or S₃ and the choice of either or ) had very little impact on the performance of test statistics.

When one sample was generated from a multivariate t distribution or a Wishart distribution and another sample was generated from its location shift counterpart, the performance of the proposed robust nonparametric test statistics outperformed the Hotelling’s T² and extended U statistics in detecting the location shift between the two samples. The power of these robust test statistics was mostly larger than the power given by the Hotelling’s T² and extended U statistics. Among the nonparametric test statistics, as in the case of multivariate normal distributions, the tests based on the Hodges-Lehmann estimators were more powerful than those based on the medians. The scaled nonparametric tests generally outperform their unscaled counterparts. The nonparametric tests based on T_l and , l = 2, 3, are most powerful among the investigated test statistics, and the Type I errors given by these four test statistics are mostly close to 0.05. The powers given by the investigated test statistics consistently increased as the location difference between two samples and sample sizes were enlarged.

The simulation results of the test statistics based on the permutation approach are presented in Tables 1–6. Tables 1–3 report the Type I errors and power obtained from the simulated paired samples that were generated using the independent variance-covariance matrix with different sample sizes. Tables 4–6 report the Type I errors and power obtained using the non-independent variance-covariance matrix. It is observed that, when the samples were generated from two multivariate normal distributions with a location shift, the performance of all the test statistics are comparable. Moreover, when one sample was generated from a multivariate t distribution or a Wishart distribution and another sample was generated from its location shift counterpart, the performance of the proposed robust nonparametric test statistics outperformed the Hotelling’s T² and extended U statistics in detecting the location shift between the two samples as it was shown by the bootstrap procedure. The tests based on the Hodges-Lehmann estimators , , T_l and , l = 2, 3, 4, 5, were slightly powerful than those based on medians , T₁ and . A cross comparison of the Type I errors and power given by the bootstrap approach and the permutation approach showed that the permutation approach was able to provide a more stringent control of Type I error and was generally more powerful than the bootstrap procedure. The performance of the nonparametric tests T_l and , l = 2, 3, 4, 5 did not differ when the permutation approach is applied. Although in Tables 1–6 the scaled nonparametric test statistics cannot be distinguished from their unscaled counterparts, these results were not generalizable since Fried and Dehling [4] had explicitly demonstrated the advantages of the scaled nonparametric test statistics over the unscaled ones.

Download:

Table 1. Type I errors (δ = 0) and power (δ ≠ 0) given by the investigated test statistics based on permutation approach in detecting location shift between two samples generated from the four pairs of F(x) and G(x) with variance-covariance matrix I_4×4 and sample sizes n = m = 10.

https://doi.org/10.1371/journal.pone.0195894.t001

Download:

Table 2. Type I errors (δ = 0) and power (δ ≠ 0) given by the investigated test statistics based on permutation approach in detecting location shift between two samples generated from the four pairs of F(x) and G(x) with variance-covariance matrix I_4×4 and sample sizes n = m = 20.

https://doi.org/10.1371/journal.pone.0195894.t002

Download:

Table 3. Type I errors (δ = 0) and power (δ ≠ 0) given by the investigated test statistics based on permutation approach in detecting location shift between two samples generated from the four pairs of F(x) and G(x) with variance-covariance matrix I_4×4 and sample sizes n = m = 50.

https://doi.org/10.1371/journal.pone.0195894.t003

Download:

Table 4. Type I errors (δ = 0) and power (δ ≠ 0) given by the investigated test statistics based on permutation approach in detecting location shift between two samples generated from the four pairs of F(x) and G(x) with the non-independent variance-covariance matrix and sample sizes n = m = 10.

https://doi.org/10.1371/journal.pone.0195894.t004

Download:

Table 5. Type I errors (δ = 0) and power (δ ≠ 0) given by the investigated test statistics based on permutation approach in detecting location shift between two samples generated from the four pairs of F(x) and G(x) with the non-independent variance-covariance matrix and sample sizes n = m = 20.

https://doi.org/10.1371/journal.pone.0195894.t005

Download:

Table 6. Type I errors (δ = 0) and power (δ ≠ 0) given by the investigated test statistics based on permutation approach in detecting location shift between two samples generated from the four pairs of F(x) and G(x) with the non-independent variance-covariance matrix and sample sizes n = m = 50.

https://doi.org/10.1371/journal.pone.0195894.t006

Naturally, the proposed nonparametric test statistics function properly without the multivariate normality assumption that the classical Hotelling’s T² test requires and therefore are robust to non-normality and outliers. This is the primary reason that we observe in the simulation studies that the proposed tests were comparable to the the Hotelling’s T² and the extended U tests when the two samples were simulated from multivariate normal distributions and they outperform the two tests when normality does not hold for the simulated samples.

Statistical analysis of the Thai Healthy Choices study

This section introduces the Thai Healthy Choices study and reports the analysis results from hypothesis testing on effect of a four-session motivational interviewing-based intervention developed in the study to reduce risk behaviors among youth living with HIV [11]. The proposed nonparametric robust test statistics T_l and , l = 1, 2, ⋯, 5, as well as Hotelling’s T² and extended U statistic, were applied in the hypothesis testing procedure to detect multivariate difference between intervention and control groups in the study.

The Thai Healthy Choices study

The Thai Healthy Choices study was conducted at the Thai Red Cross AIDS Research Center in Bangkok [11]. Thai youth living with HIV and attending the Thai Red Cross AIDS Research Centre clinics in Bangkok, who were interested in participation in the study, were referred by their physicians to the study team. The participants eligible to enroll in the study are those between 16 and 25 years old, HIV-positive, and understanding spoken and written Thai enough to participate in study assessments and sessions. Upon completion of consent, participants were randomized in a one-to-one ratio to receive either an designed intervention approach named Healthy Choices (intervention group) or general health education (control group). At the baseline visit, participants completed the assessments. After the baseline visit, participants began to attend either four Healthy Choices sessions in intervention group or four general health education sessions in control group, based on randomization. The sessions in both groups occurred at 1, 2, 6 and 12 weeks after the baseline visit. Each session took approximately 60 min. The assessments similar to the baseline visit were conducted at 1 month after the fourth session and again at 6 months after the fourth session in both groups.

The intervention group received Healthy Choices, a four-session individual-level Motivational Interviewing (MI) counseling that targeted two of three possible risk behaviors, including sexual risks, alcohol use, and antiretroviral adherence. The intervention was delivered in Thai by an MI-trained interventionist. The details of the intervention have been published elsewhere [11]. Session 1 focused on eliciting the participants view of the behavior, exploring barriers as well as sociocultural factors affecting risks and building motivation to initiate the change plan. Session 2 followed a similar format with a focus on the second targeted behaviors. Sessions 3 and 4 were to formalize the personalized behavior change plan, reinforce commitment to change, and identify strategies to maintain healthy behaviors and to prevent relapse. All MI strategies to enhance motivation were used throughout all sessions. The control group received four individualized sessions of general health education unrelated to HIV risk behaviors. Session 1 focused on healthy diet, Session 2 on exercise, and Session 3 on smoking and healthy sleep habits. Session 4 was an overall review of the participants knowledge learned during the prior sessions. The contents of the sessions were adapted from the health education materials published by the Thai Ministry of Public Health. All sessions were delivered didactically by a research assistant who read the contents of the health education manual to the participant. The research assistant received no MI training and was instructed to avoid discussing HIV-related topics, including sexual behavior, HIV disclosure, alcohol and substance use, and medication adherence with the participant.

There were six primary clinical measures for the success of the investigated intervention. (1) HIV sexual risk score. An HIV sexual risk scoring system was empirically created based on eight sexual behavior characteristics: sexual intercourse, condom use, number of partners, HIV status of partners, anal sex, receptive anal sex, receptive vaginal sex, and alcohol use with sex. A score (ranging from 1 to 13) was calculated for each participant at each study visit based on the individuals sexual activities in the previous 30 days. The purpose of the scoring system was to provide a broad view of the quantifiable magnitude of an individuals sexual risk. (2) Viral load. Blood samples for plasma HIV viral loads were obtained at baseline, 1 month follow-up, and 6 months follow-up in both study groups and were analyzed by COBAS AmpliPrep/Amplicor HIV-1 Monitor Test, version 1.5 (Roche Molecular Systems, Branchburg, NJ), with the lower limit of detection at 50 copies/ml. (3) HIV stigma. Participants completed the 12-item HIV Stigma Scale, which was developed from Berger’s 40-item HIV Stigma Scale [17]. The measure contains four stigma subscales, with three items per each subscale, representing personalized stigma, disclosure concerns, negative self-image, and public attitude stigma. Cronbach’s α was 0.80 in the present study. (4) Mental health. Participants completed the 12-item Thai General Health Questionnaire covering depression, anxiety, social impairment, and somatic complaints. All items were rated on a four-point Likert scale, ranging from 1 (not at all) to 4 (much more than usual). The scores were averaged and a mean score ≥2 was considered clinically significant. Cronbach’s α was 0.85 in the present study. (5) Self-efficacy on confidence in avoiding multiple sex partners, and (6) self-efficacy on confidence in using condoms. The Self-Efficacy for Health Promotion and Risk Reduction questionnaire contains 6 items on confidence in using a condom and 3 items on confidence in avoiding sex with multiple partners. Items were rated on a 5-point Likert scale ranging from 1 (very sure I cannot) to 5 (very sure I can). Cronbach’s α was 0.89 in the this study. Figs 1–3 display the histograms of HIV sexual risk scores, self-efficacy on avoiding multiple partners, and self-efficacy on condom use for treatment and control groups at baseline and 6-month visits. Figs 4–6 display the boxplots of visual load, HIV stigma, mental health for treatment and control groups at baseline and 6-month visits.

Download:

Fig 1. Histograms of HIV sexual risk scores for treatment and control groups at baseline and 6-month visits.

https://doi.org/10.1371/journal.pone.0195894.g001

Download:

Fig 2. Histograms of self-efficacy on avoiding multiple partners for treatment and control groups at baseline and 6-month visits.

https://doi.org/10.1371/journal.pone.0195894.g002

Download:

Fig 3. Histograms of self-efficacy on condom use for treatment and control groups at baseline and 6-month visits.

https://doi.org/10.1371/journal.pone.0195894.g003

Download:

Fig 4. Boxplots of visual load for treatment and control groups at baseline and 6-month visits.

https://doi.org/10.1371/journal.pone.0195894.g004

Download:

Fig 5. Boxplots of HIV stigma for treatment and control groups at baseline and 6-month visits.

https://doi.org/10.1371/journal.pone.0195894.g005

Download:

Fig 6. Boxplots of mental health for treatment and control groups at baseline and 6-month visits.

https://doi.org/10.1371/journal.pone.0195894.g006

Hypothesis testing on intervention effect

In the Thai Healthy Choices study, effect of the four-session motivational interviewing-based intervention were simultaneously evaluated by six primary clinical measures: namely HIV sexual risk score, viral load, HIV stigma, mental health, self-efficacy of condom use, and self-efficacy of avoiding multiple sex partners. One approach to determine whether the intervention effect is statistically significant is to conduct a hypothesis test using the nonparametric robust test statistics to determine whether the intervention group and the control group are different in terms of the 6-dimensional multivariate clinical measure at the end of the study (i.e., at 6-month visit). A total number of 74 HIV-positive men who have sex with men were included in this analysis: 37 individuals in intervention group and 37 individuals in control group [18]. Among all participants, 16 of them had missing values and these missing values were replaced with the sample mean of the corresponding variables in each group.

Differences between sample means, medians and two Hodges-Lehmann location estimators of intervention and control groups are reported in S7 Table for each individual clinical measure. These differences demonstrate that the intervention effect may be driven by HIV sexual risk score and HIV stigma. Hypothesis tests were conducted to formally determine whether there was distributional difference between intervention and control groups at baseline and 6-month visits. The null hypothesis was that probability distributions of the multivariate clinical measure for intervention and control groups are identical, and the alternative hypothesis was that there was a location shift between the distributions of the multivariate clinical measure for intervention and control groups. The proposed test statistics T_l and , l = 2, 3, 4, 5, as well as Hotelling’s T² and extended U statistic, are applied to detect the location shift. The hypothesis testing results, including the values of test statistics and the corresponding p-values, are reported in the upper panel in Table 7. For the baseline samples, all test statistics failed to reject the null hypothesis at the significant level of 0.05, suggesting there was not any statistically significant difference between the probability distributions of the two groups at the baseline visit. For the samples collected at the 6-month follow-up visit, the test statistics T_l and , l = 2, 3, and Hotelling’s T² statistic rejected the null hypothesis but others did not. This implied that distributional locations of the samples collected from the two study groups may be statistically different after 6 months of intervention. Furthermore, we compared the two samples collected at the baseline visit and the 6 month visit within each of the two study groups. The null hypothesis was that probability distributions of the multivariate clinical measure are identical at the baseline and the 6 month visits for the intervention group or for the control group, and the alternative hypothesis was that there was a location shift between the distributions of the multivariate clinical measure at the baseline and the 6 month visits within each group. The hypothesis testing results are reported in the lower panel in Table 7. For the intervention group, the test statistics and rejected the null hypothesis whereas others did not. For the control group, none of the test statistics rejected the null hypothesis.

Download:

Table 7. Values of test statistics and corresponding p-values for comparison of intervention and control groups at baseline and 6-month visits and for comparison of baseline (upper panel) and 6-month visits within each group (lower panel).

https://doi.org/10.1371/journal.pone.0195894.t007

The analysis results of hypothesis testing conclude that there existed statistically significant intervention effect for the four-session motivational interviewing-based intervention developed in the Thai Healthy Choices study to reduce risk behaviors among youth living with HIV. Difference in probability distributions of the multivariate clinical measure for intervention and control groups was detected after 6-month of intervention. Such difference was also confirmed between baseline and 6-month follow-up visits for the intervention group.

Conclusions

This article proposes a series of robust nonparametric test statistics for detecting location shifts between two multivariate samples. The test statistics are constructed based upon the robust estimators of distribution location, including the medians, the two Hodges-Lehmann estimators, and the extended U statistic. Four classes of test statistics are proposed, which include (i) maximum of the component-wise medians or the Hodges-Lehmann estimators, (ii) scaled maximum of the component-wise medians or the Hodges-Lehmann estimators, (iii) maximum of the scaled component-wise medians or the Hodges-Lehmann estimators, and (iv) the extended U statistic. The simulation studies suggest that the proposed robust nonparametric test statistics are effective alternatives to the Hotelling’s T². The simulation studies also show that the nonparametric tests built upon the Hodges-Lehmann estimators are generally more powerful than others. Numerous nonparametric hypothesis testing procedures have been proposed for comparing a treatment group and a control group in clinical trials with a multivariate endpoint, in the context of nonparametric Behrens-Fisher hypothesis testing problem [19–22]. Further investigation that compares these hypothesis testing procedures with the procedures included in this article may be relevant.

Supporting information

S1 Table. Type I errors (δ = 0) and power (δ ≠ 0) given by the investigated test statistics based on bootstrap approach in detecting location shift between two samples generated from the four pairs of F(x) and G(x) with variance-covariance matrix I_4×4 and sample sizes n = m = 10.

https://doi.org/10.1371/journal.pone.0195894.s001

(PDF)

S2 Table. Type I errors (δ = 0) and power (δ ≠ 0) given by the investigated test statistics based on bootstrap approach in detecting location shift between two samples generated from the four pairs of F(x) and G(x) with variance-covariance matrix I_4×4 and sample sizes n = m = 20.

https://doi.org/10.1371/journal.pone.0195894.s002

(PDF)

S3 Table. Type I errors (δ = 0) and power (δ ≠ 0) given by the investigated test statistics based on bootstrap approach in detecting location shift between two samples generated from the four pairs of F(x) and G(x) with variance-covariance matrix I_4×4 and sample sizes n = m = 50.

https://doi.org/10.1371/journal.pone.0195894.s003

(PDF)

S4 Table. Type I errors (δ = 0) and power (δ ≠ 0) given by the investigated test statistics based on bootstrap approach in detecting location shift between two samples generated from the four pairs of F(x) and G(x) with the non-independent variance-covariance matrix and sample sizes n = m = 10.

https://doi.org/10.1371/journal.pone.0195894.s004

(PDF)

S5 Table. Type I errors (δ = 0) and power (δ ≠ 0) given by the investigated test statistics based on bootstrap approach in detecting location shift between two samples generated from the four pairs of F(x) and G(x) with the non-independent variance-covariance matrix and sample sizes n = m = 20.

https://doi.org/10.1371/journal.pone.0195894.s005

(PDF)

S6 Table. Type I errors (δ = 0) and power (δ ≠ 0) given by the investigated test statistics based on bootstrap approach in detecting location shift between two samples generated from the four pairs of F(x) and G(x) with the non-independent variance-covariance matrix and sample sizes n = m = 50.

https://doi.org/10.1371/journal.pone.0195894.s006

(PDF)

S7 Table. Differences between sample means, medians and two Hodges-Lehmann location estimators of intervention and control groups in baseline and 6-month visits.

https://doi.org/10.1371/journal.pone.0195894.s007

(PDF)

S1 File. Data from the Thai Healthy Choices study.

https://doi.org/10.1371/journal.pone.0195894.s008

(XLSX)

Acknowledgments

The authors are thankful to the Academic Editor and the two anonymous referees for their valuable comments and suggestions, which have improved the article substantially. Dr. Xuejun Jiang’s research was partially supported by Natural Science Foundation of China (11101432), Natural Science Foundation of Guangdong Province of China (2017A030313012), and Shenzhen Sci-Tech Fund (JCYJ20170307110329106). Dr. Xu Guo’s research was partially supported by National Natural Science Foundation of China (11601227 and 11626130 and Natural Science Foundation of Jiangsu Province, China (BK20150732). The research project was partially funded by the National Institute of Mental Health (R34MH077523). The authors thank the principal investigator of the project, Dr. Chokechai Rongkavilit at Valley Children’s Hospital for generously providing data for statistical analysis.

References

1. Mann HB, Whitney DR. On a test of whether one of two random variable is stochastically larger than the other. Annals of Mathematical Statistics. 1947; 18: 50–60.
- View Article
- Google Scholar
2. Yuen KK. The two-sample trimmed t for unequal population variances. Biometrika. 1974; 61: 165–170.
- View Article
- Google Scholar
3. Keselman HJ, Wilcox RR, Kowalchuk RK, Olejnik S. Comparing trimmed or least squares means of two independent skewed populations. Biometrical Journal. 2002; 44: 478–489.
- View Article
- Google Scholar
4. Fried R, Dehling H. Robust nonparametric tests for the two-sample location problem. Statistical Methods and Application. 2011; 20: 409–422.
- View Article
- Google Scholar
5. Mathur SK. A new nonparametric bivariate test for two sample location problem. Statistical Methods and Applications. 2009; 18: 375–388.
- View Article
- Google Scholar
6. Hotelling H. The generalization of Studetn’s ratio. Annals of Mathematical Statistics. 1931; 2: 360–378.
- View Article
- Google Scholar
7. Hettmansperger TP, Oja H. Affine invariant multivariate multisample sign tests. Journal of Royal Statistical Society, Series B. 1994; 56: 235–249.
- View Article
- Google Scholar
8. Hettmansperger TP, Möttönen J, Oja H. Affine invariant multivariate rank tests for several samples. Statistica Sinica. 1998; 8: 785–800.
- View Article
- Google Scholar
9. Neuhaus G, Zhu LX. Permutation tests for multivariate location problem. Journal of Multivariate Analysis. 1999; 69: 167–192.
- View Article
- Google Scholar
10. Henze N, Kar B, Zhu LX. Checking the adequecy of the multivariate semiparametric location shift model. Journal of Multivariate Analysis. 2005; 93: 238–256.
- View Article
- Google Scholar
11. Rongkavilit C, Naar-King S, Wang B, Panthong A, Bunupuradah T, Parsons JT, et al. Motivational interviewing targeting risk behaviors for youth living with HIV in Thailand. AIDS and Behavior, 2013; 17: 2063–2074. pmid:23325376
- View Article
- PubMed/NCBI
- Google Scholar
12. Hodges JL, Lehmann EL. Estimates of location based on rank tests. 1963; Annals of Mathematical Statistics. 1963; 34: 598–611.
- View Article
- Google Scholar
13. Chaudhuri P. Multivariate location estimation using extension of R-estimates through U-statistics type approach. Annals of Statistics. 1992; 20: 897–916.
- View Article
- Google Scholar
14. Ghosh M, Parr WC, Singh K, Babu GJ. A note on bootstrapping the sample median. The Annals of Statistics. 1984; 1:1130–1135.
- View Article
- Google Scholar
15. Fernández VA, Gamero MJ, García JM. A test for the two-sample problem based on empirical characteristic functions. Computational statistics & data analysis. 2008; 52:3730–3748.
- View Article
- Google Scholar
16. Alba-Fernández MV, Batsidis A, Jiménez-Gamero MD, Jodrá P. A class of tests for the two-sample problem for count data. Journal of Computational and Applied Mathematics. 2017; 318: 220–229.
- View Article
- Google Scholar
17. Berger BE, Ferrans CE, Lashle FR. Measuring stigma in people with HIV: Psychometric assessment of the HIV stigma scale. Research in Nursing and Health, 2001; 24: 518–529. pmid:11746080
- View Article
- PubMed/NCBI
- Google Scholar
18. Rongkavilit C, Wang B, Naar-King S, Bunupuradah T, Parsons JT, Panthong A, et al. Motivational interviewing targeting risky sex in HIV-positive young Thai men who have sex with men. Archives of sexual behavior. 2015; 44: 329–340. pmid:24668304
- View Article
- PubMed/NCBI
- Google Scholar
19. O’Brien PC. Procedures for comparing samples with multiple endpoints. Biometrics. 1984; Dec 1: 1079–1087.
- View Article
- Google Scholar
20. Huang P, Tilley BC, Woolson RF, Lipsitz S. Adjusting O’Brien’s test to control type I error for the generalized nonparametric Behrensâ??Fisher problem. Biometrics. 2005; 61: 532–539. pmid:16011701
- View Article
- PubMed/NCBI
- Google Scholar
21. Liu A, Li Q, Liu C, Yu K, Yu KF. A rank-based test for comparison of multidimensional outcomes. Journal of the American Statistical Association. 2010; 105: 578–587. pmid:21625372
- View Article
- PubMed/NCBI
- Google Scholar
22. Li Z, Cao F, Zhang J, Li Q. Summation of absolute value test for multiple outcome comparison with moderate effect. Journal of Systems Science and Complexity. 2013; 26: 462–469.
- View Article
- Google Scholar

[ref1] 1. Mann HB, Whitney DR. On a test of whether one of two random variable is stochastically larger than the other. Annals of Mathematical Statistics. 1947; 18: 50–60.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Yuen KK. The two-sample trimmed t for unequal population variances. Biometrika. 1974; 61: 165–170.
View Article
Google Scholar

[5] View Article

[6] Google Scholar

[ref3] 3. Keselman HJ, Wilcox RR, Kowalchuk RK, Olejnik S. Comparing trimmed or least squares means of two independent skewed populations. Biometrical Journal. 2002; 44: 478–489.
View Article
Google Scholar

[8] View Article

[9] Google Scholar

[ref4] 4. Fried R, Dehling H. Robust nonparametric tests for the two-sample location problem. Statistical Methods and Application. 2011; 20: 409–422.
View Article
Google Scholar

[11] View Article

[12] Google Scholar

[ref5] 5. Mathur SK. A new nonparametric bivariate test for two sample location problem. Statistical Methods and Applications. 2009; 18: 375–388.
View Article
Google Scholar

[14] View Article

[15] Google Scholar

[ref6] 6. Hotelling H. The generalization of Studetn’s ratio. Annals of Mathematical Statistics. 1931; 2: 360–378.
View Article
Google Scholar

[17] View Article

[18] Google Scholar

[ref7] 7. Hettmansperger TP, Oja H. Affine invariant multivariate multisample sign tests. Journal of Royal Statistical Society, Series B. 1994; 56: 235–249.
View Article
Google Scholar

[20] View Article

[21] Google Scholar

[ref8] 8. Hettmansperger TP, Möttönen J, Oja H. Affine invariant multivariate rank tests for several samples. Statistica Sinica. 1998; 8: 785–800.
View Article
Google Scholar

[23] View Article

[24] Google Scholar

[ref9] 9. Neuhaus G, Zhu LX. Permutation tests for multivariate location problem. Journal of Multivariate Analysis. 1999; 69: 167–192.
View Article
Google Scholar

[26] View Article

[27] Google Scholar

[ref10] 10. Henze N, Kar B, Zhu LX. Checking the adequecy of the multivariate semiparametric location shift model. Journal of Multivariate Analysis. 2005; 93: 238–256.
View Article
Google Scholar

[29] View Article

[30] Google Scholar

[ref11] 11. Rongkavilit C, Naar-King S, Wang B, Panthong A, Bunupuradah T, Parsons JT, et al. Motivational interviewing targeting risk behaviors for youth living with HIV in Thailand. AIDS and Behavior, 2013; 17: 2063–2074. pmid:23325376
View Article
PubMed/NCBI
Google Scholar

[32] View Article

[33] PubMed/NCBI

[34] Google Scholar

[ref12] 12. Hodges JL, Lehmann EL. Estimates of location based on rank tests. 1963; Annals of Mathematical Statistics. 1963; 34: 598–611.
View Article
Google Scholar

[36] View Article

[37] Google Scholar

[ref13] 13. Chaudhuri P. Multivariate location estimation using extension of R-estimates through U-statistics type approach. Annals of Statistics. 1992; 20: 897–916.
View Article
Google Scholar

[39] View Article

[40] Google Scholar

[ref14] 14. Ghosh M, Parr WC, Singh K, Babu GJ. A note on bootstrapping the sample median. The Annals of Statistics. 1984; 1:1130–1135.
View Article
Google Scholar

[42] View Article

[43] Google Scholar

[ref15] 15. Fernández VA, Gamero MJ, García JM. A test for the two-sample problem based on empirical characteristic functions. Computational statistics & data analysis. 2008; 52:3730–3748.
View Article
Google Scholar

[45] View Article

[46] Google Scholar

[ref16] 16. Alba-Fernández MV, Batsidis A, Jiménez-Gamero MD, Jodrá P. A class of tests for the two-sample problem for count data. Journal of Computational and Applied Mathematics. 2017; 318: 220–229.
View Article
Google Scholar

[48] View Article

[49] Google Scholar

[ref17] 17. Berger BE, Ferrans CE, Lashle FR. Measuring stigma in people with HIV: Psychometric assessment of the HIV stigma scale. Research in Nursing and Health, 2001; 24: 518–529. pmid:11746080
View Article
PubMed/NCBI
Google Scholar

[51] View Article

[52] PubMed/NCBI

[53] Google Scholar

[ref18] 18. Rongkavilit C, Wang B, Naar-King S, Bunupuradah T, Parsons JT, Panthong A, et al. Motivational interviewing targeting risky sex in HIV-positive young Thai men who have sex with men. Archives of sexual behavior. 2015; 44: 329–340. pmid:24668304
View Article
PubMed/NCBI
Google Scholar

[55] View Article

[56] PubMed/NCBI

[57] Google Scholar

[ref19] 19. O’Brien PC. Procedures for comparing samples with multiple endpoints. Biometrics. 1984; Dec 1: 1079–1087.
View Article
Google Scholar

[59] View Article

[60] Google Scholar

[ref20] 20. Huang P, Tilley BC, Woolson RF, Lipsitz S. Adjusting O’Brien’s test to control type I error for the generalized nonparametric Behrensâ??Fisher problem. Biometrics. 2005; 61: 532–539. pmid:16011701
View Article
PubMed/NCBI
Google Scholar

[62] View Article

[63] PubMed/NCBI

[64] Google Scholar

[ref21] 21. Liu A, Li Q, Liu C, Yu K, Yu KF. A rank-based test for comparison of multidimensional outcomes. Journal of the American Statistical Association. 2010; 105: 578–587. pmid:21625372
View Article
PubMed/NCBI
Google Scholar

[66] View Article

[67] PubMed/NCBI

[68] Google Scholar

[ref22] 22. Li Z, Cao F, Zhang J, Li Q. Summation of absolute value test for multiple outcome comparison with moderate effect. Journal of Systems Science and Complexity. 2013; 26: 462–469.
View Article
Google Scholar

[70] View Article

[71] Google Scholar

Figures

Abstract

Introduction

Tests on two-sample location shift

Tests based on unscaled median difference and Hodges-Lehmann estimators

Tests based on scaled median difference and Hodges-Lehmann estimators

Tests based on U statistics

Implementation: A bootstrap procedure

Implementation: A permutation procedure

Simulation studies

Statistical analysis of the Thai Healthy Choices study

The Thai Healthy Choices study

Hypothesis testing on intervention effect

Conclusions

Supporting information

S1 Table. Type I errors (δ = 0) and power (δ ≠ 0) given by the investigated test statistics based on bootstrap approach in detecting location shift between two samples generated from the four pairs of F(x) and G(x) with variance-covariance matrix I4×4 and sample sizes n = m = 10.

S2 Table. Type I errors (δ = 0) and power (δ ≠ 0) given by the investigated test statistics based on bootstrap approach in detecting location shift between two samples generated from the four pairs of F(x) and G(x) with variance-covariance matrix I4×4 and sample sizes n = m = 20.

S3 Table. Type I errors (δ = 0) and power (δ ≠ 0) given by the investigated test statistics based on bootstrap approach in detecting location shift between two samples generated from the four pairs of F(x) and G(x) with variance-covariance matrix I4×4 and sample sizes n = m = 50.

S4 Table. Type I errors (δ = 0) and power (δ ≠ 0) given by the investigated test statistics based on bootstrap approach in detecting location shift between two samples generated from the four pairs of F(x) and G(x) with the non-independent variance-covariance matrix and sample sizes n = m = 10.

S5 Table. Type I errors (δ = 0) and power (δ ≠ 0) given by the investigated test statistics based on bootstrap approach in detecting location shift between two samples generated from the four pairs of F(x) and G(x) with the non-independent variance-covariance matrix and sample sizes n = m = 20.

S6 Table. Type I errors (δ = 0) and power (δ ≠ 0) given by the investigated test statistics based on bootstrap approach in detecting location shift between two samples generated from the four pairs of F(x) and G(x) with the non-independent variance-covariance matrix and sample sizes n = m = 50.

S7 Table. Differences between sample means, medians and two Hodges-Lehmann location estimators of intervention and control groups in baseline and 6-month visits.

S1 File. Data from the Thai Healthy Choices study.

Acknowledgments

References

S1 Table. Type I errors (δ = 0) and power (δ ≠ 0) given by the investigated test statistics based on bootstrap approach in detecting location shift between two samples generated from the four pairs of F(x) and G(x) with variance-covariance matrix I_4×4 and sample sizes n = m = 10.

S2 Table. Type I errors (δ = 0) and power (δ ≠ 0) given by the investigated test statistics based on bootstrap approach in detecting location shift between two samples generated from the four pairs of F(x) and G(x) with variance-covariance matrix I_4×4 and sample sizes n = m = 20.

S3 Table. Type I errors (δ = 0) and power (δ ≠ 0) given by the investigated test statistics based on bootstrap approach in detecting location shift between two samples generated from the four pairs of F(x) and G(x) with variance-covariance matrix I_4×4 and sample sizes n = m = 50.