Robust multivariate nonparametric tests for detection of two-sample location shift in clinical trials

This article presents and investigates performance of a series of robust multivariate nonparametric tests for detection of location shift between two multivariate samples in randomized controlled trials. The tests are built upon robust estimators of distribution locations (medians, Hodges-Lehmann estimators, and an extended U statistic) with both unscaled and scaled versions. The nonparametric tests are robust to outliers and do not assume that the two samples are drawn from multivariate normal distributions. Bootstrap and permutation approaches are introduced for determining the p-values of the proposed test statistics. Simulation studies are conducted and numerical results are reported to examine performance of the proposed statistical tests. The numerical results demonstrate that the robust multivariate nonparametric tests constructed from the Hodges-Lehmann estimators are more efficient than those based on medians and the extended U statistic. The permutation approach can provide a more stringent control of Type I error and is generally more powerful than the bootstrap procedure. The proposed robust nonparametric tests are applied to detect multivariate distributional difference between the intervention and control groups in the Thai Healthy Choices study and examine the intervention effect of a four-session motivational interviewing-based intervention developed in the study to reduce risk behaviors among youth living with HIV.


Introduction
In randomized controlled trials, effectiveness (or efficacy) of a treatment effect is constantly characterized by the difference between distributional locations of a treatment group and its control group. Hypothesis testing is the primary statistical inference approach in examining treatment effects in clinical trials, when it is conducted to detect whether there exists any PLOS  difference between distributional locations of the treatment group and the control group. When the primary endpoint is one-dimensional and normally distributed for both study groups, the two-sample t test is the standard tool. Yet, the two-sample t test may not be valid when normality assumption is violated. The two-sample t test is not robust to outliers and heavy-tail distributions. A number of robust nonparametric tests have been developed in the literature as a complement of the two-sample t test. The classic Wilcoxon-Mann-Whitney test [1] that used the rank sum is a nonparametric counterpart of the two-sample t test. Yuen [2] and Keselman et al. [3] recommended to construct the tests using trimmed means. Recently, Fried and Dehling [4] proposed a series of robust nonparametric tests for detecting univariate two-sample location difference. These tests were constructed based upon unscaled and properly scaled robust location estimators of distributions, including medians and Hodges-Lehmann estimators. The numerical studies reported by Fried and Dehling [4] showed that the test statistics were robust to outliers and non-normality and efficient in detecting univariate two-sample location shift. Mathur [5] proposed a strictly nonparametric bivariate test constructed from an extended U statistic and concluded that the test statistic did not depend on the covariance structure of the underlying population and was more powerful than the existing tests.
In randomized controlled trials, effectiveness of a treatment effect can be defined by not a single or two, but multiple primary endpoints, and significance of the treatment effect is then determined by multivariate location shift between the two multivariate distributions of the treatment and control groups. In these clinical trials, multivariate hypothesis testing procedures are needed to detect a potential location shift between two samples that are drawn with a multivariate primary endpoint. The conventional univariate two-sample t test was extended to the multivariate setting by Hotelling [6] and the proposed test statistic was denominated Hotelling's T 2 statistic. The Hotelling's T 2 test inherits the limitations of univariate two-sample t test, because it is still not robust to multivariate outliers and not valid when the multivariate normality assumption is violated. This motivated the development of multivariate two-sample location tests. Hettmansperger and Oja [7] developed a multivariate sign test for detecting location deviation among multiple multivariate samples. Hettmansperger et al. [8] introduced affine invariant analogues of the two-sample Mann-Whitney-Wilcoxon rank sum test. Neuhaus and Zhu [9] proposed multivariate distribution-free permutation test statistics that were built upon projected univariate versions of multivariate data. Henze et al. [10] introduced a class of consistent tests, in which the test statistic is a weighted integral of the squared modulus of the difference of the empirical characteristic functions of one multivariate sample and another multivariate sample plus a location shift.
In this article, we extend the robust nonparametric test statistics proposed by Fried and Dehling [4] and Mathur [5] to the multivariate setting. A series of robust multivariate nonparametric tests are proposed using the component-wise medians, Hodges-Lehmann location estimators, and an extended U statistic. Univariate test statistics for detecting multivariate twosample location shift are constructed for the robust multivariate nonparametric tests as (i) unscaled maximum of the component-wise medians or the Hodges-Lehmann estimators, (ii) scaled maximum of the component-wise medians or the Hodges-Lehmann estimators, (iii) maximum of the scaled component-wise medians or the Hodges-Lehmann estimators, or (iv) the extended U statistic. A bootstrap approach and a permutation approach are introduced for determining the p-values of the proposed test statistics. We conduct simulation studies to examine performance of the proposed robust nonparametric test statistics in detecting multivariate two-sample location shift. The numerical results given by the bootstrap procedure demonstrate that the proposed robust multivariate nonparametric tests constructed from the Hodges-Lehmann estimators are more efficient than those based on medians and the extended U statistic. The permutation approach can provide a more stringent control of Type I error and is more powerful than the bootstrap procedure. To demonstrate the use of these proposed robust multivariate nonparametric tests, the proposed hypothesis tests are applied to detecting the intervention effect of the Thai Healthy Choices study [11], a study that promotes a foursession motivational interviewing-based intervention to reduce risk behaviors among youth living with HIV (the Thai Healthy Choices study was designed jointly by Wayne State University and the Thai Red Cross AIDS Research Center, and implemented in Bangkok, Thailand).
The scientific contribution of this article is multifold. First, a series of new robust nonparametric test statistics are proposed for detecting location shift between two multivariate samples collected from the treatment and control groups, respectively, in clinical trials. Both a bootstrap approach and a permutation approach are introduced to implement the proposed tests for seeking corresponding p-values. These provide the practitioners a variety of choices with two distinct implementation approaches to test on treatment effects when the normality assumption for the samples is violated. Second, comprehensive numerical studies are conducted and the results show explicit benefits from using the proposed tests over the Hotelling's T 2 and the extended U tests in term of controlling Type I error and boosting statistical power. Third, the article presents a representative example, the Thai Healthy Choices study, and shows how the proposed robust nonparametric hypothesis testing procedures can be implemented to test on the treatment or intervention effect in a clinical trial.

Tests on two-sample location shift
The null hypothesis of equality of F(x) and G(x) and its alternative hypothesis that there is a location shift in the two multivariate distributions are A natural idea to test the above hypotheses is to compare location estimators of the two distributions. Sample means " X and " Y can be used to fulfil this mission, which leads to the prominent Hotelling's T 2 test. However, the Hotelling's T 2 test is constructed under the multivariate normal distributions, and therefore perform poorly when there are outliers or the underlying true distributions of the two samples do not follow multivariate normal distributions.

Tests based on unscaled median difference and Hodges-Lehmann estimators
Here, we propose a series of robust nonparametric test statistics based on robust estimators for distribution locations as competitors of the Hotelling's T 2 test statistics. A general approach to construct such nonparametric tests is to estimate the location difference Δ and then reject the null hypothesis H 0 if Δ is far from zero. As usual, we can replace difference of sample means with difference between two sample medians: In (2), med{X 1 , Á Á Á, X m } and med{Y 1 , Á Á Á, Y n } are the p-dimensional median vectors of the two samples. The median vector of a sample is defined as the vector of component-wise medians. That is, the kth component of med{X 1 , Á Á Á, X m } is the median of X 1k , Á Á Á, X mk , where X ik is the kth component of p-dimensional observation X i for i = 1, Á Á Á, m, and the kth component of In practice, however,D 1 cannot be directly used as a test statistic for the pair of hypotheses in (1), sinceD 1 is a p-dimensional vector and is not a scalar.
Therefore, the following maximum of absolute values of the p medians withinD 1 can be considered:D max 1 ¼ max fjD 11 j; jD 12 j; Á Á Á ; jD 1p jg; whereD 1k is the kth component ofD 1 for k = 1, Á Á Á, p. Under the null hypothesisD max 1 should be close to zero, whereas under alternative hypothesis it deviates from zero.
Noted that, although the direct sample medians med{X 1 , Á Á Á, X m } and med{Y 1 , Á Á Á, Y n } in (2) are robust estimators for the locations of two samples, these medians are not very efficient as each of them exploits little information in the sample data. To balance efficiency against robustness, two types of Hodges-Lehmann estimators were developed [12,13]. Multivariate analogs of the univariate Hodges-Lehmann estimators arê where the p-dimensional multivariate median vectors are likewise defined as in (2). The test statistics using the multivariate Hodges-Lehmann estimators to detect location shift between ¼ max fjD 31 j; jD 32 j; Á Á Á ; jD 3p jg: whereD 2k andD 3k are the kth component ofD 2 andD 3 , respectively, for k = 1, Á Á Á, p.

Tests based on scaled median difference and Hodges-Lehmann estimators
only measure the component-wise maximum variability between the two samples, a scaled version of each is required to construct a more robust nonparametric test statistic. To this end, a related measure of the variability within the two samples are needed for the procedure of standardization. ForD max 1 , the following p-dimensional median vector is the measure of component-wise differences between the two samples: where ðZ 1 ; Á Á Á ; Z mþn Þ 0 ¼ ðX 1 ÀX; Á Á Á ; X m ÀX; Y 1 ÀỸ ; Á Á Á ; Y n ÀỸ Þ 0 is the joint median-corrected sample andX ¼ medfX 1 ; Á Á Á ; X m g andỸ ¼ medfY 1 ; Á Á Á ; X n g. Then, the absolute component-wise maximum of S 1 is defined as and the standardized version ofD max 1 can be formulated as ForD max 2 andD max 3 , the following p-dimensional median vectors of the absolute set of differences in the samples and within the joint median-corrected sample can be taken as the measure of component-wise differences within the two samples: Then, the absolute component-wise maximum of them are defined as respectively. The scaled versions ofD max 2 andD max 3 are consequently constructed as and Alternative standardization procedure ofD max 1 ,D max 2 , andD max 3 can also be applied, which is to standardize each component of them and then take the maximum of all standardized components. This alternative standardization procedure leads to the following test statistics for detecting a multivariate two-sample location shift: for l = 1, 2, Á Á Á, 5, in which S lk denotes the kth element of S l , l = 1, 2.3. Under the null hypothesis, the proposed test statistics T l and T Ã l , l = 1, 2, Á Á Á, 5, should be close to zero, whereas they should be far from zero under the alternative hypothesis. When the dimension p is equal to 1, these test statistics degenerate to the test statistics introduced by Fried and Dehling [4] and T l ¼ T Ã l for l = 1, 2, Á Á Á, 5.

Tests based on U statistics
An U-statistic recently proposed by Mathur [5] was designated to test a bivariate two-sample location shift. Here, we extend it to serve for detecting the multivariate two-sample location shift. Specifically, the extended U test statistic for multivariate sample location detection is defined as where D 2 1i ¼ k X i k 2 is the Euclidean distance from {X 1 , Á Á Á, X m } to origin and D 2 2j ¼ k Y j k 2 is the Euclidean distance from {Y 1 , Á Á Á, Y n } to origin. The null hypothesis is rejected if the observed value of the extended U statistic exceeds a critical value of U obtained by permutation.

Implementation: A bootstrap procedure
Here, a bootstrap procedure is introduce to numerically approximate the p-values of the proposed robust nonparametric tests. Suppose two random samples with p-dimensional independent multivariate observations and its alternative hypothesis for a location shift in the two multivariate distributions H 1 : To conduct hypothesis testing on such a pair of hypotheses, distributions of the above proposed test statistics are generally unknown in finite samples. Therefore, a bootstrap method can be adopted to approximate the underlying distribution of a test statistic and subsequently determine the corresponding p-value. In the bootstrap procedure, a pseudo sample fX Ã n} with replacement, and another pseudo sample fY Ã j ; j ¼ 1; Á Á Á ; ng is drawn from the same pooled sample also with replacement. Let V denote any one of the investigated test statistics, and let V Ã be the bootstrap version of V that is calculated from the paired bootstrap pseudo samples fX Ã Then, the null hypothesis is rejected if V is larger than the (100% Á α) quantile of the bootstrap distribution of V Ã , where α is the level of significance of hypothesis testing. It has been confirmed in the literature that the above bootstrap procedure can produce a valid approximation to the test statistic V [14][15][16].

Implementation: A permutation procedure
A permutation procedure is a competitive alternative to the bootstrap procedure to derive critical values for the proposed robust nonparametric tests. In the permutation procedure, the pooled sample {X i , Y j ; i = 1, Á Á Á, m, j = 1, Á Á Á, n} is repeatedly split to two pseudo samples Then, the null hypothesis is rejected if V is larger than the (100% Á α) quantile of the permutation distribution of V Ã , where α is the level of significance of hypothesis testing.

Simulation studies
This section reports numerical results from a simulation study that was conducted to demonstrate merits of the proposed hypothesis tests and compare them with Hotelling's T 2 . In this simulation study, we aim at examining and comparing performance of the proposed hypothesis tests for detecting a location shift among different pairs of two samples. The sample {X 1 , Á Á Á, X m } was generated from F(x) and the sample {Y 1 , Á Á Á, Y n } was generated from G(x). Four different pairs of distributions of F(x) and G(x) were considered: (i) F(x) was a p-dimensional multivariate normal distribution N p (1 p , S p ), where 1 p is a p-dimensional vector with each component equal to one and S p is the variance-covariance matrix, and G(x) was the location shift distribution , and G (x) was the location shift distribution t 3 (1 p + Δ, S p ), and (iv) F(x) was the p-dimensional joint distribution of the diagonal elements of a Wishart random matrix that followed the Wishart distribution W p (3, S p ), where 3 is the degree of freedom and S p is the scale matrix, and G(x) = F(x + Δ) was the location shift distribution. In this simulation study, two variancecovariance matrices were taken to generated the simulation data: one is an independent variance-covariance matrix I p×p , which is a p × p identify matrix, and another one is a non-independent variance-covariance matrix with diagonal elements equal to 1 and non-diagonal elements equal to 0.5. The dimension p of the two samples was set to be 4, and the sample sizes were set as n = m = 10, 25, and 50. In this simulation study, the location vector Δ in each of four distributions was specified as Δ = (0.5δ, δ, δ, 2δ) 0 , in which δ varied to take a value of 0, 0.5, 1, 1.5, or 2. The proposed test statisticsD max 1 ,D max 2 ,D max 3 , T l and T Ã l , l = 1, 2, Á Á Á, 5, as well as Hotelling's T 2 and the extended U statistic, were applied in two-sample multivariate hypothesis testing to detect the location shift. A total number of 1000 simulation data sets were generated from each pair of specified distributions of the two samples, and then the proposed hypothesis testing was implemented using these investigated test statistics. The rejection rate was subsequently calculated as the frequency that the null hypothesis H 0 : F(x) = G(x) was rejected among the 1000 simulation data sets by each of the investigated hypothesis test statistics. When δ = 0, the pair of true distributions of the two samples have the identical location, and thus the rejection rate is corresponding to simulated Type I error of the hypothesis tests. When δ 6 ¼ 0, the pair of true distributions of the two samples reside in different locations, and the rejection rate is corresponding to simulated power of the hypothesis tests. In this simulation study, the number of bootstrap samples was set to be 500 and the significance level was set to be 0.05.
The simulation results of the test statistics based on the bootstrap procedure are presented in S1-S6 Tables. S1-S3 Tables report the Type I errors and power obtained from the simulated paired samples that were generated using the independent variance-covariance matrix with different sample sizes. S4-S6 Tables report the Type I errors and power obtained using the non-independent variance-covariance matrix. It is observed that, when the samples were generated from two multivariate normal distributions with a location shift, Hotelling's T 2 , extended U statistic, and T l and T Ã l , l = 2, 3, performed the best among all the investigated test statistics in term of Type I errors and power as δ varied. There was not sufficient numerical evidence that the Hotelling's T 2 statistic outperformed other five statistics. The tests based on the Hodges-Lehmann estimatorsD max 2 ,D max 3 , T l and T Ã l , l = 2, 3, 4, 5, were more powerful than those based on mediansD max 1 , T 1 and T Ã 1 . The choice of the measure of variability within the two samples (i.e., the choice of either S 2 or S 3 and the choice of either S max 2 or S max 3 ) had very little impact on the performance of test statistics.
When one sample was generated from a multivariate t distribution or a Wishart distribution and another sample was generated from its location shift counterpart, the performance of the proposed robust nonparametric test statistics outperformed the Hotelling's T 2 and extended U statistics in detecting the location shift between the two samples. The power of these robust test statistics was mostly larger than the power given by the Hotelling's T 2 and extended U statistics. Among the nonparametric test statistics, as in the case of multivariate normal distributions, the tests based on the Hodges-Lehmann estimators were more powerful than those based on the medians. The scaled nonparametric tests generally outperform their unscaled counterparts. The nonparametric tests based on T l and T Ã l , l = 2, 3, are most powerful among the investigated test statistics, and the Type I errors given by these four test statistics are mostly close to 0.05. The powers given by the investigated test statistics consistently increased as the location difference between two samples and sample sizes were enlarged.
The simulation results of the test statistics based on the permutation approach are presented in Tables 1-6. Tables 1-3 report the Type I errors and power obtained from the simulated paired samples that were generated using the independent variance-covariance matrix with different sample sizes. Tables 4-6 report the Type I errors and power obtained using the non- Table 1. Type I errors (δ = 0) and power (δ 6 ¼ 0) given by the investigated test statistics based on permutation approach in detecting location shift between two samples generated from the four pairs of F(x) and G(x) with variance-covariance matrix I 4×4 and sample sizes n = m = 10.
Type I errors (δ = 0) and power (δ 6 ¼ 0) Robust multivariate nonparametric tests for detection of two-sample location shift in clinical trials Table 2. Type I errors (δ = 0) and power (δ 6 ¼ 0) given by the investigated test statistics based on permutation approach in detecting location shift between two samples generated from the four pairs of F(x) and G(x) with variance-covariance matrix I 4×4 and sample sizes n = m = 20.
Type I errors (δ = 0) and power (δ 6 ¼ 0)  Table 3. Type I errors (δ = 0) and power (δ 6 ¼ 0) given by the investigated test statistics based on permutation approach in detecting location shift between two samples generated from the four pairs of F(x) and G(x) with variance-covariance matrix I 4×4 and sample sizes n = m = 50.
Type I errors (δ = 0) and power (δ 6 ¼ 0) Robust multivariate nonparametric tests for detection of two-sample location shift in clinical trials

Table 4. Type I errors (δ = 0) and power (δ 6 ¼ 0) given by the investigated test statistics based on permutation approach in detecting location shift between two samples generated from the four pairs of F(x) and G(x) with the non-independent variance-covariance matrix and sample sizes n = m = 10.
Type I errors (δ = 0) and power (δ 6 ¼ 0)  Table 5. Type I errors (δ = 0) and power (δ 6 ¼ 0) given by the investigated test statistics based on permutation approach in detecting location shift between two samples generated from the four pairs of F(x) and G(x) with the non-independent variance-covariance matrix and sample sizes n = m = 20.
Type I errors (δ = 0) and power (δ 6 ¼ 0) Robust multivariate nonparametric tests for detection of two-sample location shift in clinical trials independent variance-covariance matrix. It is observed that, when the samples were generated from two multivariate normal distributions with a location shift, the performance of all the test statistics are comparable. Moreover, when one sample was generated from a multivariate t distribution or a Wishart distribution and another sample was generated from its location shift counterpart, the performance of the proposed robust nonparametric test statistics outperformed the Hotelling's T 2 and extended U statistics in detecting the location shift between the two samples as it was shown by the bootstrap procedure. The tests based on the Hodges-Lehmann estimatorsD max 2 ,D max 3 , T l and T Ã l , l = 2, 3, 4, 5, were slightly powerful than those based on mediansD max 1 , T 1 and T Ã 1 . A cross comparison of the Type I errors and power given by the bootstrap approach and the permutation approach showed that the permutation approach was able to provide a more stringent control of Type I error and was generally more powerful than the bootstrap procedure. The performance of the nonparametric tests T l and T Ã l , l = 2, 3, 4, 5 did not differ when the permutation approach is applied. Although in Tables 1-6 the scaled nonparametric test statistics cannot be distinguished from their unscaled counterparts, these results were not generalizable since Fried and Dehling [4] had explicitly demonstrated the advantages of the scaled nonparametric test statistics over the unscaled ones.
Naturally, the proposed nonparametric test statistics function properly without the multivariate normality assumption that the classical Hotelling's T 2 test requires and therefore are robust to non-normality and outliers. This is the primary reason that we observe in the simulation studies that the proposed tests were comparable to the the Hotelling's T 2 and the extended U tests when the two samples were simulated from multivariate normal distributions and they outperform the two tests when normality does not hold for the simulated samples.

Table 6. Type I errors (δ = 0) and power (δ 6 ¼ 0) given by the investigated test statistics based on permutation approach in detecting location shift between two samples generated from the four pairs of F(x) and G(x) with the non-independent variance-covariance matrix and sample sizes n = m = 50.
Type I errors (δ = 0) and power (δ 6 ¼ 0) Robust multivariate nonparametric tests for detection of two-sample location shift in clinical trials

Statistical analysis of the Thai Healthy Choices study
This section introduces the Thai Healthy Choices study and reports the analysis results from hypothesis testing on effect of a four-session motivational interviewing-based intervention developed in the study to reduce risk behaviors among youth living with HIV [11]. The proposed nonparametric robust test statistics T l and T Ã l , l = 1, 2, Á Á Á, 5, as well as Hotelling's T 2 and extended U statistic, were applied in the hypothesis testing procedure to detect multivariate difference between intervention and control groups in the study.

The Thai Healthy Choices study
The Thai Healthy Choices study was conducted at the Thai Red Cross AIDS Research Center in Bangkok [11]. Thai youth living with HIV and attending the Thai Red Cross AIDS Research Centre clinics in Bangkok, who were interested in participation in the study, were referred by their physicians to the study team. The participants eligible to enroll in the study are those between 16 and 25 years old, HIV-positive, and understanding spoken and written Thai enough to participate in study assessments and sessions. Upon completion of consent, participants were randomized in a one-to-one ratio to receive either an designed intervention approach named Healthy Choices (intervention group) or general health education (control group). At the baseline visit, participants completed the assessments. After the baseline visit, participants began to attend either four Healthy Choices sessions in intervention group or four general health education sessions in control group, based on randomization. The sessions in both groups occurred at 1, 2, 6 and 12 weeks after the baseline visit. Each session took approximately 60 min. The assessments similar to the baseline visit were conducted at 1 month after the fourth session and again at 6 months after the fourth session in both groups.
The intervention group received Healthy Choices, a four-session individual-level Motivational Interviewing (MI) counseling that targeted two of three possible risk behaviors, including sexual risks, alcohol use, and antiretroviral adherence. The intervention was delivered in Thai by an MI-trained interventionist. The details of the intervention have been published elsewhere [11]. Session 1 focused on eliciting the participants view of the behavior, exploring barriers as well as sociocultural factors affecting risks and building motivation to initiate the change plan. Session 2 followed a similar format with a focus on the second targeted behaviors. Sessions 3 and 4 were to formalize the personalized behavior change plan, reinforce commitment to change, and identify strategies to maintain healthy behaviors and to prevent relapse. All MI strategies to enhance motivation were used throughout all sessions. The control group received four individualized sessions of general health education unrelated to HIV risk behaviors. Session 1 focused on healthy diet, Session 2 on exercise, and Session 3 on smoking and healthy sleep habits. Session 4 was an overall review of the participants knowledge learned during the prior sessions. The contents of the sessions were adapted from the health education materials published by the Thai Ministry of Public Health. All sessions were delivered didactically by a research assistant who read the contents of the health education manual to the participant. The research assistant received no MI training and was instructed to avoid discussing HIV-related topics, including sexual behavior, HIV disclosure, alcohol and substance use, and medication adherence with the participant.
There were six primary clinical measures for the success of the investigated intervention. (1) HIV sexual risk score. An HIV sexual risk scoring system was empirically created based on eight sexual behavior characteristics: sexual intercourse, condom use, number of partners, HIV status of partners, anal sex, receptive anal sex, receptive vaginal sex, and alcohol use with sex. A score (ranging from 1 to 13) was calculated for each participant at each study visit based on the individuals sexual activities in the previous 30 days. The purpose of the scoring system was to provide a broad view of the quantifiable magnitude of an individuals sexual risk. (2) Viral load. Blood samples for plasma HIV viral loads were obtained at baseline, 1 month follow-up, and 6 months follow-up in both study groups and were analyzed by COBAS Ampli-Prep/Amplicor HIV-1 Monitor Test, version 1.5 (Roche Molecular Systems, Branchburg, NJ), with the lower limit of detection at 50 copies/ml. (3) HIV stigma. Participants completed the 12-item HIV Stigma Scale, which was developed from Berger's 40-item HIV Stigma Scale [17]. The measure contains four stigma subscales, with three items per each subscale, representing personalized stigma, disclosure concerns, negative self-image, and public attitude stigma. Cronbach's α was 0.80 in the present study. (4) Mental health. Participants completed the 12-item Thai General Health Questionnaire covering depression, anxiety, social impairment, and somatic complaints. All items were rated on a four-point Likert scale, ranging from 1 (not at all) to 4 (much more than usual). The scores were averaged and a mean score !2 was considered clinically significant. Cronbach's α was 0.85 in the present study. (5) Self-efficacy on confidence in avoiding multiple sex partners, and (6) self-efficacy on confidence in using condoms. The Self-Efficacy for Health Promotion and Risk Reduction questionnaire contains 6 items on confidence in using a condom and 3 items on confidence in avoiding sex with multiple partners. Items were rated on a 5-point Likert scale ranging from 1 (very sure I cannot) to 5 (very sure I can). Cronbach's α was 0.89 in the this study. Figs 1-3 display the histograms of HIV sexual risk scores, self-efficacy on avoiding multiple partners, and self-efficacy on condom use for treatment and control groups at baseline and 6-month visits. Figs 4-6 display the

Hypothesis testing on intervention effect
In the Thai Healthy Choices study, effect of the four-session motivational interviewing-based intervention were simultaneously evaluated by six primary clinical measures: namely HIV sexual risk score, viral load, HIV stigma, mental health, self-efficacy of condom use, and self-efficacy of avoiding multiple sex partners. One approach to determine whether the intervention effect is statistically significant is to conduct a hypothesis test using the nonparametric robust test statistics to determine whether the intervention group and the control group are different in terms of the 6-dimensional multivariate clinical measure at the end of the study (i.e., at 6-month visit). A total number of 74 HIV-positive men who have sex with men were included in this analysis: 37 individuals in intervention group and 37 individuals in control group [18]. Among all participants, 16 of them had missing values and these missing values were replaced with the sample mean of the corresponding variables in each group.
Differences between sample means, medians and two Hodges-Lehmann location estimators of intervention and control groups are reported in S7 Table for each individual clinical measure. These differences demonstrate that the intervention effect may be driven by HIV sexual risk score and HIV stigma. Hypothesis tests were conducted to formally determine whether there was distributional difference between intervention and control groups at baseline and 6-month visits. The null hypothesis was that probability distributions of the multivariate clinical measure for intervention and control groups are identical, and the alternative hypothesis was that there was a location shift between the distributions of the multivariate clinical measure for intervention and control groups. The proposed test statistics T l and T Ã l , l = 2, 3, 4, 5, as well as Hotelling's T 2 and extended U statistic, are applied to detect the location shift. The hypothesis testing results, including the values of test statistics and the corresponding p-values, are reported in the upper panel in Table 7. For the baseline samples, all test statistics failed to reject the null hypothesis at the significant level of 0.05, suggesting there was not any statistically significant difference between the probability distributions of the two groups at the baseline visit. For the samples collected at the 6-month follow-up visit, the test statistics T l and T Ã l , l = 2, 3, and Hotelling's T 2 statistic rejected the null hypothesis but others did not. This implied that distributional locations of the samples collected from the two study groups may be statistically different after 6 months of intervention. Furthermore, we compared the two samples collected at the baseline visit and the 6 month visit within each of the two study groups. The null hypothesis was that probability distributions of the multivariate clinical measure are identical at the baseline and the 6 month visits for the intervention group or for the control group, and the alternative hypothesis was that there was a location shift between the distributions of the multivariate clinical measure at the baseline and the 6 month visits within each group. The hypothesis testing results are reported in the lower panel in Table 7. For the intervention group, the test statistics T Ã 2 and T Ã 3 rejected the null hypothesis whereas others did not. For the control group, none of the test statistics rejected the null hypothesis. Robust multivariate nonparametric tests for detection of two-sample location shift in clinical trials  Robust multivariate nonparametric tests for detection of two-sample location shift in clinical trials The analysis results of hypothesis testing conclude that there existed statistically significant intervention effect for the four-session motivational interviewing-based intervention developed in the Thai Healthy Choices study to reduce risk behaviors among youth living with HIV. Difference in probability distributions of the multivariate clinical measure for intervention and control groups was detected after 6-month of intervention. Such difference was also confirmed between baseline and 6-month follow-up visits for the intervention group. https://doi.org/10.1371/journal.pone.0195894.g006

Conclusions
This article proposes a series of robust nonparametric test statistics for detecting location shifts between two multivariate samples. The test statistics are constructed based upon the robust estimators of distribution location, including the medians, the two Hodges-Lehmann estimators, and the extended U statistic. Four classes of test statistics are proposed, which include (i) maximum of the component-wise medians or the Hodges-Lehmann estimators, (ii) scaled maximum of the component-wise medians or the Hodges-Lehmann estimators, (iii) maximum of the scaled component-wise medians or the Hodges-Lehmann estimators, and (iv) the extended U statistic. The simulation studies suggest that the proposed robust nonparametric test statistics are effective alternatives to the Hotelling's T 2 . The simulation studies also show that the nonparametric tests built upon the Hodges-Lehmann estimators are generally more powerful than others. Numerous nonparametric hypothesis testing procedures have been proposed for comparing a treatment group and a control group in clinical trials with a multivariate endpoint, in the context of nonparametric Behrens-Fisher hypothesis testing problem [19][20][21][22]. Further investigation that compares these hypothesis testing procedures with the procedures included in this article may be relevant.
Supporting information S1 Table. Type I Table. Type I errors (δ = 0) and power (δ 6 ¼ 0) given by the investigated test statistics based on bootstrap approach in detecting location shift between two samples generated from the four pairs of F(x) and G(x) with the non-independent variance-covariance matrix and sample sizes n = m = 20.
(PDF) S6 Table. Type I errors (δ = 0) and power (δ 6 ¼ 0) given by the investigated test statistics based on bootstrap approach in detecting location shift between two samples generated from the four pairs of F(x) and G(x) with the non-independent variance-covariance matrix and sample sizes n = m = 50. (PDF) S7 Table. Differences between sample means, medians and two Hodges-Lehmann location estimators of intervention and control groups in baseline and 6-month visits.