Feasibility of Recruiting a Diverse Sample of Men Who Have Sex with Men: Observation from Nanjing, China

Background Respondent-driven-sampling (RDS) has well been recognized as a method for sampling from most hard-to-reach populations like commercial sex workers, drug users and men who have sex with men. However the feasibility of this sampling strategy in terms of recruiting a diverse spectrum of these hidden populations has not been understood well yet in developing countries. Methods In a cross sectional study in Nanjing city of Jiangsu province of China, 430 MSM were recruited including 9 seeds in 14 weeks of study period using RDS. Information regarding socio-demographic characteristics and sexual risk behavior were collected and testing was done for HIV and syphilis. Duration, completion, participant characteristics and the equilibrium of key factors were used for assessing feasibility of RDS. Homophily of key variables, socio-demographic distribution and social network size were used as the indicators of diversity. Results In the study sample, adjusted HIV and syphilis prevalence were 6.6% and 14.6% respectively. Majority (96.3%) of the participants were recruited by members of their own social network. Although there was a tendency for recruitment within the same self-identified group (homosexuals recruited 60.0% homosexuals), considerable cross-group recruitment (bisexuals recruited 52.3% homosexuals) was also seen. Homophily of the self-identified sexual orientations was 0.111 for homosexuals. Upon completion of the recruitment process, participant characteristics and the equilibrium of key factors indicated that RDS was feasible for sampling MSM in Nanjing. Participants recruited by RDS were found to be diverse after assessing the homophily of key variables in successive waves of recruitment, the proportion of characteristics after reaching equilibrium and the social network size. The observed design effects were nearly the same or even better than the theoretical design effect of 2. Conclusion RDS was found to be an efficient and feasible sampling method for recruiting a diverse sample of MSM in a reasonable time.


Introduction
While conducting a research involving hard-to-reach populations, such as injecting drug users (IDUs), commercial sex workers (CSWs) or men who have sex with men (MSM), ensuring the representativeness of the study sample is a big challenge. In the past two decades, many strategies have been used to recruit diverse samples from these hidden population groups, but generalizability of the study results always remained an issue and selection bias was always a probability. These recruitment methods included timelocation sampling, snowballing and targeted sampling. However, most of these methods only provided limited coverage and could only claim pooled representation of the target populations [1]. To deal with these problems, Heckathorn DD introduced respondentdriven sampling (RDS) in 1994, while studying HIV-related risk behaviors among IDUs residing in the eastern part of United States [2].
In the past few years, RDS has been widely used in many countries to recruit hard-to-reach populations and to conduct large-scale HIV serologic and behavioral surveys [3]. It has also been recognized and adopted by public health researchers as a promising alternative method for sampling from most-at-risk populations [4][5][6][7][8][9]. RDS is a variant of the chain-referral methodology, which requires that the population of interest be internally well-connected though social networks [2,9]. The method uses a mathematical model to compensate for the nonrepresentativeness by keeping track of the recruitment process and using probabilistic weights. While sampling from a hidden population from which recruitment of a random sample is not feasible otherwise, the initially selected participants (''Seeds'') in RDS need not be random to have a somewhat representative sample as long as the seeds belong to target population. With the continuation of the recruitment process in RDS, the distribution of the characteristics of the sample stabilizes gradually and this condition is termed as ''reaching the equilibrium'' [2,10,11]. These frameworks of RDS help in minimizing the sampling biases found commonly in chain referral sampling [2,9]. However, the feasibility of RDS still remained unclear.
MSM population is probably playing a significant role in the HIV epidemic of China where like other countries in Asia, HIV sero-positivity level is rising among MSM. Nanjing, a metropolitan city, located in one of the largest economic zone of China and having a population of approximately 8 million (according to 2010 census) is no exception. HIV percentage positivity among the participating MSM in Nanjing has increased from none detected in 2003 to 5.8% in 2007 [12]. ''Money boy'' is a group of male who provide commercial sexual services to other male in different parts of China and some other countries. In Nanjing, commercial sex with money boys is very common among MSM population, especially among young males [13,14].
Several studies have been conducted to assess whether RDS is a feasible strategy to recruit samples from hidden populations [5,7,8]. However, most of these studies either did not evaluate the efficiency of RDS in terms of reaching a diverse spectrum of hardto-reach populations or didn't address design effect (Deff) and sample size issues. These limitations called for this study among the MSM population of Nanjing, a large metropolitan city of China, with the aims of assessing the feasibility of RDS for reaching this hard-to-reach population, whether the strategy yields a diverse sample and whether it reaches the designed sample size and design effect.

Ethics Statement
All the participants provided written informed consent for voluntarily taking part in this survey on their own. Signed informed consent was obtained from each of the participants prior to interview, blood collection and intervention at each round of the surveys. Each of the participants had the ability to decline or withdraw himself from this survey at any time. The questionnaires and written consent document were separately kept in locked cupboards at the study sites and unauthorized persons had no access on them.
The study process and content were approved by the Ethics Committee at Jiangsu Provincial Center for Disease Prevention and Control (JSCDC, Nanjing, Jiangsu 210009, China.

Study Design and Sampling Methods
In the year 2008 between the months of May and August a cross-sectional study was conducted using RDS as the sampling strategy for recruiting MSM in Nanjing city of China. The sampling began with a set of initial participants (''seeds'') recruited with the help of MSM community-based organizations, operators of bars and bathhouses/spas and from restrooms/parks or internet. 9 individual seeds who were different from each other in terms of income, age, occupation and ''cruising area'' (venues for meeting sexual partners), were thus recruited. These seeds initiated the expanding chain of referrals, whereby respondents from each link in the chain or ''wave'' referred other respondents to form the subsequent waves of referral [10]. It has been shown that using RDS any member of a hard-to-reach community can be reached theoretically by using six separate waves (principle of ''six degrees of separation'') [10]. In this study, each seed initially recruited three other MSM from their social networks for behavioral evaluation and serological testing, using uniquely numbered coupons to allow tracking of the recruitment process. Each respondent received a gift (containing lubricant and condoms) worth 4.50 USD approximately to compensate for his time contribution. Three recruitment coupons were also given to them to be passed on to their acquaintances. If at least one new participant was recruited with a coupon, the respondent making the referral with that coupon received an additional gift (prepaid phone card) worth 4.50 USD approximately as a token of appreciation of his effort. Official residency, education level, marital status, syphilis sero-status, sexual orientation and cruising area these six key factors were used to monitor whether RDS had reached equilibrium or not.
Homophily is a statistic that describes the mixing patterns in networks and the probability of an HIV-positive individual successfully referring another HIV-positive individual from a population. Homophily can be negative or positive (ranging from 21 to +1), depending on whether an individual preferentially contacts or avoids someone with the same given characteristic [6,10,[15][16][17][18]. It has been shown that when homophily is zero for all groups, the estimated population proportion (EPP) for the target population is identical to the actual sample population proportions (SPE) [2,6,10,[15][16][17][18][19].
To attain the distribution of the sample characteristics free of the biasing effects of the non-random seed selection process, equilibrium distributions were set at the statistical software RDSAT (software for statistical analysis of data from sample recruited by RDS) to fall within 2% of the sample distribution [2,10,11]. Social network size was defined as the number of MSM in the city known (familiar with face, name/nickname, had contact information, and could get in touch with him in the next month) to the participant. The design effect was determined by the ratio of the actual variance under the used sampling method and the variance computed under the assumption of simple random sampling [19,20].
The study was conducted at the clinic for sexually transmitted infections (STI) of the Center for Disease Control and Prevention of Jiangsu Province (JSCDC) between May and August, 2008. Eligibility criteria for the participants were: 1) male; 2) having sex with men (oral and/or anal) within the past year; 3)18 years or older; 4) had not participated in a similar survey within the past three months and 5) had a valid referral coupon.

Measures
Data measures. Duration of survey, completion of the recruitment process using RDS, characteristics of the seeds and their recruits and the equilibrium of the key factors were used as the parameters for the evaluation of the feasibility of RDS. While doing this design effects (Deff) of selected variables were also determined. Homophily of key variables, the proportional distribution of selected demographic variables and the social network size were the indicators to assess the diversity of RDS.
During data collection, the process of distribution of the referral coupons was kept restricted to control exponential sample growth. The number of distributed coupons was reduced from three to two and then to one after the sample size reached 350, and no more coupons were distributed after the sample size reached 420.
Face-to-face interviews with a structured questionnaire were conducted to collect information on recruitment patterns, demographics, HIV knowledge, coverage of HIV prevention services, recent sexual behavior and drug use, and STI-related symptoms. Demographic information included age (less than 20, 20-29, 30-39, 40-49, and over 50 years old), marital status (single, married and divorced/widowed), occupation (15 occupational categories that included most recognized occupations), education level (illiterate, elementary school, junior high school; senior high school, technical secondary school and junior college/college degree/higher), residency (official resident of Jiangsu province or other provinces) and family income in Yuan (less than 2,000, 2001-3000, 3001-4000, 4001-5000, and more than 5000; 1 USD = approximately 6.4 Yuan).
Self-identified sexual orientations were classified as undecided, only homosexual, mainly heterosexual and bisexual. Equilibrium distributions were assessed based on sexual orientation, being single, official residency for Nanjing, college degree or higher education, proportion of syphilis sero-positive cases and recruiting sexual partners online. Homophilies were also calculated for self-identified sexual orientation and proportion of syphilis cases.
Serologic measures. Five ml of venous blood was collected from each subject for HIV and syphilis testing. HIV antibodies were screened using a rapid test (Acon Biotech Co., Ltd, lot 200803973/WB). If the screening result was positive, it was confirmed by Western blot (HIVBLOT 2.2, Genelabs Diagnostics, Singapore, lot AE8039). Syphilis antibodies were screened using Rapid Plasma Reagin (RPR; Beijing WanTai Biological Pharmacy Enterprise Co., Ltd., lot N20080404) test and confirmed with Treponema Pallidum Particle Agglutination assay (TPPA; Livzon Group Reagent Factory, Guangdong, China, lot VN80803). Syphilis positivity was deemed ''current'' when both TPPA and RPR assays were positive.

Sample size and design effect (Deff)
Using Deff = 2.0, for the detection of a 10% increase in high risk sexual behavior with 80% power and 95% confidence level, the required sample size for this study was calculated to be 460. The detailed sample size estimation process has been described in Appendix S1. In our study, 430 participants were recruited during 14 weeks of study period, including the nine seeds. The Deff for being HIV-negative was 2.48, syphilis-negative 1.87, engaged in unprotected anal sex 2.20, only homosexual 1.85, and having college degree or higher education 2.95. The detailed estimation process for standard error (se), variance using RDS and observed Deff has been described in Appendix S2.

Data Analyses
Respondent Driven Sampling Analysis Tool (RDSAT), version 5.6 (available free online http://www.respondentdrivensampling. org) was initially used to calculate the population adjusted point estimate, 95% confidence interval and level of homophily. RDS uses network information to account for potential sources of bias in the sample and provides mathematical methods for adjusting estimates based on these biases [21]. Hence the data gathered using RDS methods in this study were analyzed with RDSAT using weighting based on the inverse probability of selection proportional to the size of the network of each participant to adjust for the potential sampling biases. We also used STATA 10.0 to calculate the equilibrium and Deff.

Results
Among the 430 participants recruited in this study with the help of the 9 seeds, 20 were HIV positive, with crude prevalence of 4.6% and adjusted prevalence of 6.6% (adjusted by RDSAT, based on the weight of network sizes). The characteristics of the 9 seeds for this study are presented in Table 1.

Feasibility
Among the 455 respondents who visited the study site and provided consent, 25 did not have a valid recruitment coupon and were excluded from the study. Data from the other 430, including the 9 seeds, were collected over a period of 14 weeks (between 5 th May and 1 st August, 2008). The survey office was open six days a week from 8:30 am to 5:00 pm and two to four interviewers were present at the site during this period. The average number of completed interviews was 30.71(minimum = 2, maximum = 64) per week. The wait time for being interviewed and seen by the clinicians was less than 20 minutes and the average time that a participant spent at the survey site was about 45 minutes Figure 1 shows the recruitment flow of the participants (level of enrollment) as the survey progressed toward the targeted sample size. As the process of recruitment progressed beyond the 5 th week, the percentage of participants coming to the survey site started to decrease. When the number of recruited subjects reached 350, we gradually reduced the number of recruitment coupons given to participants from 3 to 2 and then to 1 while the coupon validity was shortened from 15 to 10 days. After the recruitment of 420 subjects, no more coupons were distributed. By the end of 14 th week, the survey reached the sample size of 430.
To see how coupons were distributed by the participants within their social networks, participants were asked about their relationship with the person who gave them the coupon. 41.6% received coupons from a close friend, 3.0% from a sexual partner more than six months ago and 13.3% from a sexual partner within the past six months. Only 3.7% reported receiving one from a stranger. The majority (90.2%) of participants reported that their primary reason for accepting a coupon and coming to the clinic site was to be tested and treated for HIV/STIs. Only 0.3% reported coming only for the incentive. An equilibrium distribution system was used to evaluate whether the recruitment met the design needs. The rate of syphilis reached equilibrium by the fifth wave. Equilibrium distribution for being self-identified homosexual, single and resident of Nanjing were all reached by the seventh wave while usually meeting sexual partners on the internet and having a college degree or higher education were reached by the tenth wave.
Diversity 56.8% of the participants found their sexual partners on the internet, 16.2% at pubs, discos, tearooms or clubs; 15.2% at spas, bathhouses, saunas or massage centers; 4.3% in parks and public restrooms while 7.6% found them somewhere else (Table 2). Table 2 also presents the joint recruitment patterns of the venues for meeting sexual partners. The homophily of this variable is not close to zero (0.325 for bars, 0.240 for bathhouse, 0.103 for parks, 0.57 for internet and 21.0 for others).
In our study, 5.0% participants reported that they at least once paid for sex in the past six months, and another 5.8% participants reported that they were paid for sex in the past six months. Meanwhile, 2.6% participants self reported that they were drug users. Figure 2 shows the branching patterns of the path of maximum recruitment (88 recruits, 8 waves), which began with one self-identified homosexual seed. The figure also shows the cross distributions of sexual orientations among different chains.
The homophily of the self-identified sexual orientations was 0.111 for only homosexuals (n = 242), indicating that only homosexuals were probably socially insular and preferentially recruited other self-identified homosexuals more (59.7%) than MSM of other orientations (40.3%). Bisexuals (n = 167) demonstrated a homophily of 20.012 and mainly heterosexuals (n = 10) demonstrated a homophily of 0.236. Table 3 shows the RDS estimates for the proportion of syphilis cases among participants. MSM infected with syphilis (n = 54) demonstrated a homophily of 0.077 while those who were syphilis-negative (n = 376) demonstrated a homophily of 0.251. Although only 14.6% of the population was syphilispositive, they recruited 21.2% other syphilis cases and 78.8% syphilis-negatives. While 85.4% of the population was syphilisnegative, they recruited 89.1% syphilis-negative and 10.9% positive participants. However, cross-recruitment was substantial (19.2%), implying that recruitment chains did not become trapped within a single group, but instead crossed lines. This contributed toward a strong convergence between the sample composition (12.6% syphilis cases) and the equilibrium sample composition (12.1% syphilis cases). It also indicated that homophily and estimated network size both were potentially important factors for the referral process. Average network size was found to be 6.9 and 5.6 for syphilis negative and positive participants respectively. The recruitment patterns according to self-identified sexual orientation indicated mixing of different orientations amongst participants (Table 4). However, there was a tendency for recruitment within a group, along with considerable cross-group recruitment. Overall, homosexuals recruited 60.0% only homosexuals, 1.2% mainly heterosexuals, 36.8% bisexuals. Bisexuals recruited 40.5% bisexuals, 52.3% only homosexuals. There was a strong convergence between the sample composition (56.8% only homosexuals, 2.4% mainly heterosexuals, 38.2% bisexuals) and the equilibrium sample composition (56.7% only homosexuals, 2.6% mainly heterosexuals, 38.1% bisexuals). Homophily for self-identified sexual orientation was 0.111 in cases of only homosexuals, 0.236 for mainly heterosexuals and 20.012 for bisexuals. Mainly heterosexuals had a larger network size (9.3) compared to only homosexuals (7.0) and bisexuals (6.2). When the equilibrium distribution was reached, the proportions of self-identified sexual orientations were 56.4% only homosexual, 38.8% bisexual, 2.2% mainly heterosexual.

Discussion
Our study began with nine seeds, recruited a sample of total 430 eligible participants, and observed the crude and adjusted HIV prevalence of 4.6% and 6.6% respectively. The unadjusted prevalence was nearly the same as observed in a study in 2007, using non-probability sampling (convenience sampling) techniques, with HIV prevalence of 4.7% [22]. After adjustment, the HIV prevalence in our study changed by about 43%. Several potential reasons could have lead to this difference: first, the prevalence of HIV was not high, a slight change of the weight might have significantly changed the difference of the two; second, our study might still have the problem of in-group affiliation (e.g. the large homophily for venues where the participants were meeting their partners), which could have biased the crude HIV prevalence, and the RDSAT may potentially eliminated or reduced this bias.
The recruitment period lasted over 14 weeks, with a maximum wave of 14 and recruited participants with varying self-identified sexual orientations who looked for partners in varying venues and the cost of recruitment was acceptable. The nine seeds for our study were selected non-randomly, based on their differing sexual orientations, cruising areas and ages. However, the findings of our study suggested that using RDS with a small number of seeds recruited from non-random, well-defined venues, successful recruitment of participants from a broader spectrum was feasible. This reflected the Markov theory that biases introduced into a chain referral sample by the non-random selection of an individual (seed) are weakened with each recruitment wave and are ultimately eliminated [2,19,23]. As recruitment continued, the participants were gradually more distant from the seeds who led to the recruitment and the recruitment ran by itself for at least some if not all of the seeds.
An important character of RDS is that if it is successful, RDS could reach the most hard-to-reach population, like drug users and money boy, by capturing individuals in hidden groups from the social network of the same group. The results of our study demonstrated such ability of RDS. In our study, we found that about 5.0%, 5.8% and 2.6% participants had paid for sex in last six months, were paid for sex during last six months and used drug in the past year respectively. Such findings further supported the diversity of the participants in our study.
We found that the majority of the participants were recruited by someone in their own personal social network (96.3%). The majority (90%) responded because they wanted to be tested and treated (if required) for HIV, syphilis and HCV. While we accept that the diversity demonstrated in our study did not ensure representativeness, due to the fact that there was always a higher likelihood for selection of more concerned and compliant participants, we still claim that offering HIV/STI testing and treatment brought forth more participants from this hidden population.
The variables of interest reached equilibrium and a relatively small number of waves were needed to yield sufficient sociometric depth to attain an equilibrium distribution. To reach the equilibrium of self-identified sexual orientation, marital status, residency, proportion of syphilis-positives and education levels, we needed to enroll seven, seven, six, seven, five, and ten waves, respectively. Reaching equilibrium also confirmed the uniformity of the seeds who remained in the study, which ensured more internal validity. From these results, we concluded that after adequate waves, reaching equilibrium in recruitment by RDS was feasible.
We used homophily scores to analyze the performance of RDS in recruiting diverse samples. Homophily, an index to evaluate whether the study sample from RDS could obtain a diverse sample [5,8], demonstrated that RDS was an effective method to reach a diverse sample. The cross-recruitment and the related homophily implied that self-identified heterosexual MSM were not closely linked with the MSM networks of other orientations, possibly because most MSM who identified themselves as heterosexuals were money boys. The cross-recruitment among others was substantial, implying that recruitment chains did not become trapped within a single group, but instead, crossed group lines. This also meant that although the seeds were limited in some Feasibility of RDS for Recruiting MSM in Nanjing PLOS ONE | www.plosone.org particular characteristics, the entire sample was not trapped within a particular group and we were successful in recruiting a diverse study population.
However, the large homophily (not close to zero) for venues for meeting sexual partners in our study also pointed out that our study still had the problem of oversampling of a certain group of hidden population, this was evidenced by the overestimate of the proportion of subjects who did usually meet their sexual partners through the internet (56.8%).
The distribution of the social network sizes of the participants changed significantly when we adjusted the crude distribution for the sizes of the social networks. Participants with strong network ties with other MSM were demonstrated by the large average social network sizes calculated according to sexual orientation. This might have indicated that social networks were important for distribution of the recruitment coupons and for continuation of recruitment. Therefore, social networks might be a very good medium for propagating HIV and STI prevention programs. The heterosexuals had the largest social network sizes, perhaps because most were money boys.
Our study was initially designed to have a sample size of 460 but only 430 were actually recruited. However, we found that the selected Deffs were nearly equal to or higher than 2, which demonstrated that the recruited sample size had met the needs.
Besides these, our study also found that a large proportion of our participants were engaged in unprotected anal intercourse (UAI), although majority of the participants had college level education. This result corroborated with our findings in a previous study [24]. Although these two observations seemed to be inconsistent with each other, this situation was common among MSM, and our previous study had already demonstrated that education and knowledge were poor predictors for UAI. This behavior was probably best predicted by intention and not associated with education or knowledge [25]. Other studies on sexually transmitted infections also reported similar findings [26].
RDS is usually feasible with the right choice of recruitment incentives, and the appropriate sampling size according to homophily and cross-states in the recruitment patterns. Only when the networks are independent, RDS is generally found to be unable to capture individuals from both populations. In our study, the situation seemed to be alike, which probably indicated that the choice of incentive worked nicely and the networks among our participants were not independent. The possibility of the speed of Table 2. Venues for meeting sexual partners among participants and recruiter.  development of network in one group being faster than another existed among our study subjects as was evident from the difference in homophily across different groups. As the social network size was not extremely large, the estimates for the crossgroups probabilities were likely to be influenced. Our use of different seeds was somewhat likely to have taken care of this problem although the possibility of some residual effect was very much there.
We also recognized that our study data have many limitations. First, there was a possibility for information bias, especially recall bias. Secondly, although RDS was successful in studies of IDUs, FSWs and MSM in terms of recruitment efficiency in the past [10,[27][28][29][30], the data we collected were hard to analyze using conventional statistical software, particularly for univariate and bivariate analyses. We used RDSAT 5.6, which is a software designed specifically for analysis of data collected through RDS. An RDS sample without proper adjustment is nothing more than a very good snowball sample (not a representative sample). However, there might still be potential for bias due to oversampling participants with large personal networks, which was avoided in our study by limiting numbers of distributed coupons during later waves. Also, our study was conducted in an STI clinic, which had the potential for introducing selection bias, and we think even RDS had partial capability for controlling this. To Table 3. Characteristics of the estimation process using RDS, regarding syphilis serostatus.  control and reduce the effects of some of those problems, we used the following methods. Four interviewers were trained for the interview to minimize interviewer bias while conducting the faceto-face interviews. We also appointed two persons to check for completion of questionnaires after each interview to minimize errors and inconsistencies. If any error was detected, it was corrected before the participant left the survey site. We believe that due to socio-behavioral issues for MSM, RDS was the optimal sampling method for our objectives. Further, we used the most advanced methods to adjust the collected data.

Conclusion
This was the first time RDS was used for sampling MSM in Jiangsu Province. Overall, we found that use of RDS among MSM in Nanjing was feasible and the recruited sample was diverse. Thus, RDS was found to be an effective strategy to achieve a diverse sample of MSM. RDS may be applicable in other cities in Jiangsu Province and other areas of China to gather data from MSM, CSWs and IDUs for serologic and behavior surveillance in future.

Supporting Information
Appendix S1 The sample size and power estimation.