A performance assessment of web-based respondent driven sampling among workers with precarious employment in Sweden

Objectives Precarious employment (PE) is a social determinant of poor health of workers. However, this population usually lack a sampling frame, making it challenging to identify the characteristics of this group. Web-based respondent driven sampling (webRDS) recruits individuals online through the social network and can provide population estimates. This study aims to assess the performance of webRDS in a population of workers with PE. Method WebRDS was used for recruitment and data collection in the PRecarious EMployment In Stockholm (PREMIS) study. Cross-sectional questionnaire data was collected between November 2016 and May 2017. Eligible participants were living and/or working in Stockholm County, 18–65 years old, had a personal identification number and were currently employed. WebRDS performance was assessed by the total sample size, length of recruitment chains, sample composition, sample proportions and estimated RDSII population proportions with confidence intervals. Results The webRDS process resulted in a sample of 358 recruits and a total sample of 415 participants, recruited over 1–15 waves. Of the participating seeds and recruits, 60% and 48%, respectively, successfully recruited at least one peer. The sample composition stabilized for all variables assessed. The sample proportions and RDSII estimates differed by 1–8% and the confidence intervals included the sample proportions for all variables except one. Conclusions WebRDS successfully recruited a sufficient sample of workers with precarious employment from which population estimates could be made. Future studies should consider implementing webRDS on a national level in order to further study this population.


Results
The webRDS process resulted in a sample of 358 recruits and a total sample of 415 participants, recruited over 1-15 waves. Of the participating seeds and recruits, 60% and 48%, respectively, successfully recruited at least one peer. The sample composition stabilized for all variables assessed. The sample proportions and RDSII estimates differed by 1-8% and the confidence intervals included the sample proportions for all variables except one. PLOS  Introduction various populations, which include men who have sex with men in Sweden and Vietnam [24,26], university students and young adults in the US [19,27], and the general population in the Netherlands and Thailand [28]. WebRDS holds promise for recruiting precariously employed in Sweden as internet use is high-93% of the population has internet access at home and 80% use internet every day [29]. Further, online studies provides access to the study at the chosen time and place of the participant, as well as it offers a high sense of confidentiality.
The aim of this paper is to assess the performance of web-based respondent driven sampling in a population of workers with precarious employment in Stockholm County, Sweden.

Study design and study population
The current study is a web-based cross-sectional study conducted within the frame of the longitudinal research project PRecarious EMployment in Stockholm (PREMIS). PREMIS aims at studying health outcomes of PE among residents in Stockholm County. Inclusion criteria for participants were: holding a precarious employment at the time of participation, living and/or working in Stockholm County, being 18-65 years of age, holding and indicating a Swedish personal identification number, and giving informed consent. Precarious employment, in terms of current employment situation, was defined as primarily working but without a fixed, full-time, employment. In this study, this included one of the following types of employment: involuntary part-time employment, temporary employment, employed by the hour/called when needed, self-employed (because he/she can not get an employment), or working as a trainee. Exclusion criteria were: indicating an inaccurate personal identification number, having a fixed and full-time employment, being voluntarily self-employed, being a student or a pensioner.

Web-survey design and data collection
We used a webRDS software [24,26,30] for peer recruitment and data collection via an online survey. The webRDS software contains a system for sending out e-mail or link-based invitations to the study. Each invitation contains a unique link to the survey, which have associated unique study ID's. Through these ID's, the webRDS system keeps track of the recruitment chains. The software also contains a system for sending out reminders for participation and for distributing incentives. Date of first log in to the survey, completion date, hashed IP number and type of device used to complete the survey, is also collected by the software.
The design of the web-page aimed at giving a good overview, a neutral impression and being easy to navigate. The participants could answer the survey in Swedish or English. The following sections were included in the survey: eligibility, informed consent and personal identification number, work, work environment, employment conditions, current life situation, health, background and social network information (S1 File). The survey applied the option to skip questions, except in terms of questions regarding eligibility criteria and informed consent. In order to limit the number of questions in the survey, information that could be obtained from registers (e.g., questions on education, foreign background etc.) was not included.
Eligibility was assessed on the first page and ineligible individuals could not proceed with the survey. Eligible respondents were prompted with detailed information about the study (S2 File) and written informed consent was attained by the respondent clicking "Yes" to "I understand the information given above and want to participate", and thereafter proceeding to enter their personal identification number. Age was assessed when the participant entered their personal identification number which is based on the date of birth and four control numbers including a check-sum algorithm [31]. Opt-out took place by clicking "No, I do not want to participate". Respondents indicating that they did not want to participate in the study were not contacted any further.
The webRDS recruitment process and survey was pilot tested with six convenience sampled participants that matched the inclusion and exclusion criteria.

Seed recruitment
Recruitment was initiated by spreading information about the study through the reference group involved in the study and word of mouth; and through advertising in areas around Stockholm and online. Interested individuals were referred to the project website where information about the study and contact details to the study coordinator were published. Potential seeds who made contact were contacted via telephone. During the call, the study coordinator provided information about the study, ensured that the seeds matched the inclusion criteria, were motivated to participate, as well as able and willing to invite peers. Further, information on sex, age, and type of employment was assessed during the call in order to recruit seeds as diverse as possible. In order to recruit a group of successful seeds, the survey was sent to seeds evaluated to be a good fit for the study based on these assessments.

WebRDS recruitment and sampling
Participants were asked to invite 1-4 peers from their own networks after completing the survey. Participants could send invitations via direct e-mail, indirect e-mail, link, or via What-sApp or text message (the latter two if completing the survey on a smart phone) through the invitation feature of the webRDS software. If the participant invited four peers, and thereby generating four unique survey links and study ID's, the webRDS system notified the participant that the maximum number of invitations were sent and thereafter no new invitations could be generated. The invitations included information on the study aim, eligibility criteria and contact information to the study coordinator. Reminders to respond to the survey were sent continuously during the data collection period. A first reminder was sent after three days via e-mail from the webRDS software. After 14 days, an e-mail reminder was sent to the recruiter of the peer. If the recruiter was a seed, the reminder was delivered via a phone call, primarily, or a text message, secondarily. A third and fourth reminder was sent after approximately 45 and 90 days, respectively. Further, reminders to invite peers were sent five days after completion to participants that invited one or no peers.
The study applied a dual incentive system, as is customary in RDS methodology [18]. Participants received a small economical reimbursement for participating in the survey and when two of their invited peers participated in the survey. The compensations were either a cinema ticket or a gift card worth 100 Swedish kronor (SEK; approximately 11 USD). These compensations were decided upon through thorough discussion with the reference group and thereafter piloted.

Data cleaning and analysis
All submitted surveys were assessed in terms of eligibility criteria, completeness and suspected cheating (repeated participation). Ineligible participants and participants re-using or giving an incorrect personal identification number [31] were excluded. If two or more surveys contained the same personal identification number, the survey(s) submitted at the later point in time were disregarded. Suspected cheating was assessed by comparing the date and time of participation among peers invited by the same recruiter. Participants were further investigated if there was <30 minutes between one peer completing the survey and the other starting the survey. These participants were excluded if they shared features implying repeated participation, such as having similar e-mail addresses, using the same browser, device and operating system in several following waves of the recruitment tree. Ineligible participants are displayed in the recruitment trees, but otherwise removed from the main analyses.
WebRDS performance was evaluated by assessing the recruitment dynamics and sample characteristics. The previous was done by reporting the total number of participants, number of waves in the recruitment chains, and median network degree; plotting the cumulative sample; recruitment homophily; and assessing the stabilization of sample composition with every 25-participant increase in sample size, i.e., equilibrium. Network degree is the number of individuals that the participant is connected to in the target population [19]. This was assessed by the question "How many people whom are also precarious workers (working but without a fixed, full time employment), older than 18 years, could you invite to this study via internet if the invitations were not limited to four?". Recruitment homophily is the ratio of the number of recruits with the same characteristic as their recruiter to the number expected by chance (i.e., 1 indicates no homophily) [32]. Equilibrium was considered reached at the point when the change in sample composition was 2% or less over five consecutive 25-participant increases. Sample characteristics were summarized in terms of sample proportions, estimated RDSII population proportions and confidence intervals [23]. Seeds were included in the full sample and analyses if not otherwise stated. Chi square analyses and Mann-Whitney U tests were conducted in order to compare recruiting and non-recruiting seeds. Recruitment homophily was assessed both for the eligible recruits and eligible recruits plus ineligible recruits in order to detect potential differences. Response rate was defined as the number of entries to the survey divided by the number of invitations sent out by the system (i.e., seed and peer invitations) during the data collection period. Non-response rate was defined as 1-response rate. Analyses were conducted using SPSS version 24

Ethical considerations
The study was approved by the regional ethics board of Stockholm (dnr: 2016/1291-31/5). Written informed consent was attained by the respondent clicking "Yes" to the question "I understand the information given above and want to participate" after reading the study information. Data was stored on password protected encrypted servers. Personal identification numbers were replaced by serial numbers and stored separately after data collection.

Results
Data was collected between the 24 th of November 2016 and the 1 st of May 2017. A total of 75 seeds were invited continuously, out of which 57 participated. Overall, 957 invitations were sent out from the system and 595 potential participants entered the survey, giving a response rate of 62%. Participants were excluded before participation due to not matching the eligibility criteria or not completing the survey (n = 112), and after participating in the survey due to being deemed ineligible (n = 68). Of the latter, participants were excluded due to not matching employment or county (n = 6), re-using a personal number (n = 8), giving an incorrect personal number (n = 17), being underage (n = 1) and due to suspected cheating (n = 36). Participants excluded due to suspected cheating belonged to the same recruitment tree. The remaining 415 participants were considered the full sample. Fig 1 displays the cumulative number of participants over the course of data collection. After approximately one and three months of data collection, respectively, 133 and 271 participants had completed the survey.

Seed recruitment performance
Seed characteristics and recruitment performance are displayed in Table 1. Out of the participating seeds, 34 (60%) successfully recruited at least one peer (hereafter referred to as recruiting seeds). Recruiting seeds, compared to non-recruiting seeds, were to a larger extent female (79% versus 65%), older (85% versus 61% were above 25 years) and recruited by online advertisement (71% versus 44%). Further, recruiting seeds invited two or more peers to a greater extent than non-recruiting seeds (94% vs. 63%).
The length of the recruitment chains varied between 0-27 waves (Fig 2) with a median of four waves. Disregarding the ineligible participants, the longest recruitment chain was 15 waves and with a median of three waves. The largest recruitment tree contained 133 eligible recruits, i.e., 37% of all eligible recruits.

Sample recruitment performance
Direct e-mail invitation and link invitation were the most popular invitation-methods ( Table 2). These methods were used by 79% and 17% of the full sample, and 76% and 20% of the recruits. Out of the full sample, 58% invited two or more peers to participate in the survey and 50% successfully recruited at least one peer. Out of the recruits, 55% invited two or more peers and 48% successfully recruited at least one peer. For seeds, these proportions were 83% and 60%, respectively. The median network degree was three (range 1-300) for both the full sample and the recruits. Seeds had a median network degree of four (range 1-100). The majority of the full sample, recruits and seeds stated that 60-100% of their network degree was in the same age group (+/-5 years). Recruits were mainly invited by a close friend or a co-worker.

Sample characteristics
Slightly more than half of the sample was female (54%) and employed on demand/by the hour (59%); slightly less than half of the sample was 25-29 years old (44%) ( Table 3). The majority of the sample had an income below 18 000 SEK after taxes (83%), which is the approximate median net income in Sweden [33]. Sample proportions and RDSII estimates differed by 1.3-8.1% for the full sample, and by 1-7.6% for the recruits. The sample proportions were within the estimated confidence intervals for all variables but income.
Equilibrium was reached at a cumulative number of 307 participants for sex, 157 participants for age and employment type, and 182 participants for income (Fig 3A-3D).

Discussion
WebRDS successfully recruited a sample of workers with precarious employment. The sample was sufficiently large to make estimations, generated partly by long recruitment chains, and stabilized for all variables assessed. The sample proportions approximated the population proportions and all sample proportions except income were within the estimated confidence intervals. WebRDS successfully captured a diverse sample of the population of precarious workers and had a response rate of 62%. However, we do not know how many of the invitations that were actually sent to an e-mail address in use, if links were generated without being sent out, or how many invitations that were read. Thereby, the response rate might actually be higher. The non-response rate was 38%, in comparison to 53% and 43% in previous national surveys assessing employment conditions and work environment [15,16]. The comparatively lower non-response rate in this study might be due to the convenience of online recruitment and participation, compared to for example phone interviews used in LFS [12,13]. Another benefit of RDS is the incentive system that uses a mild "peer pressure" and dual reimbursements [18,34], which could have influenced both the willingness to invite and participate. Peer-to-peer recruitment also has the potential to reach demographic groups that are underrepresented in national surveys in Sweden, such as men, younger individuals and foreign born. The seeds managed to recruit a sample consisting of 49% men and 78% individuals under 30 years of age, indicating that these demographic groups indeed are reachable. National surveys have the benefit of being selected from a population based sampling frame and the potential of giving rise to representative samples and unbiased estimates; however, with high non-response rates among demographic groups of interest regarding precarious employment, there is a need for  other data collection methods to further our understanding of the situation of this population. WebRDS could be an alternative method to successfully sample this group at a national level. All of the variables except income had sample proportions within the estimated confidence intervals. Further, the difference between the sample and estimated proportions was greater for income compared to all the other variables. One possible explanation to these results could be that there is a large variety in income in the population of precariously employed in Stockholm. In addition, low income might skew the sample towards inclusion of low income participants as the economical reimbursement may be more attractive to participants in this subgroup [20]. A larger sample size could provide a more reliable estimate for income within the population. Equilibrium was reached for all of the variables assessed, indicating that the sample composition became independent from the seeds [34]. The homophily results indicated in-group recruitment for all of the variables assessed. Age had the largest homophily estimate which might not be surprising as the absolute majority of the participants answered that the majority of their self-reported network degree is within the same age group. Homophily is related to equilibrium as high in-group or out-group recruitment leads to slower attainment of equilibrium [18,35]. Thereby, it is possible that equilibrium would have been attained with even fewer participants if there were less or no homophily.
WebRDS carries several benefits in comparison with real life RDS. For instance, the ability to connect with many peers in a short time period and not having to meet in person. The population of precarious employees is young and therefore likely to use social media and have large peer-networks online. The results of this study also indicate that a large proportion of workers with a precarious employment work on the hour/on demand, which could make it challenging to recruit co-workers or friends at work face-to-face, as well as to find a suitable time to meet with an interviewer. Thereby the costs in terms of time, effort and non-anonymity are reduced. The easiness of inviting peers through multiple online options and participation being possible in short time spans via for instance smart phones is therefore likely to have facilitated the recruitment process both in terms of the number of peers the participant could invite, and the number of peers that actually participated. The online strategy was also important in terms of recruiting successful seeds, as the majority of the recruiting seeds were recruited via online advertisement.
The longest recruitment chain was 15 waves which is within the span of previous webRDSstudies reporting recruitment chains to vary between 5-24 waves [19,24,26,28]. Further, 60% of the seeds and 48% of the recruits successfully recruited at least one peer, which is within the frame of results from previous studies, reporting rates between 20-90% [19,24,26,28] and 52%, respectively [19]. Several factors could have influenced the recruitment success in this study. One such factor could be the attractiveness of the economical reimbursements. Of the participants, 58% invited two or more peers and 53% of these participants successfully recruited two or more peers, which indicate that the reimbursements type and value were a good choice for this population. Further, precarious employment is a situation that is related to many aspects of life, such as income and health [2,8]. The PREMIS study might therefore be important and relevant to many of the participants, which could have increased the willingness to participate.
In this study, one recruitment chain made up 37% of the recruits. This is not uncommon in RDS studies as some seeds can be so called "super seeds". The successfulness of super seeds can depend on a positive feedback process as the number of participants that can recruit peers increase successiviely [19]. However, certain characteristics of seeds can increase the likelihood of successful recruitment, such as ensuring sociodemographic diversity (e.g., age, sex, living area etc.), that seeds are committed to the goals of the study and have many social ties [19,34]. In this study, motivation, as well as ability and willingness to invite peers, was assessed in the initial phone call with the study coordinator. However, it is possible that the speed of the recruitment process and number of lengthy recruitment chains could have benefited from an even more structured process in order to ensure motivation, large number of social ties and further diversity among the seeds. Especially as the results indicate that participants tend to recruit peers that share their characteristics. Therefore, we recommend future studies using webRDS to recruit precariously employed to ensure that the seed recruitment process takes these factors into consideration.

Limitations
There were several limitations to this study. First, some RDS assumptions were potentially not fulfilled in this study: sampling with replacement and random recruitment. In practice, sampling is commonly done without replacement and earlier samples might therefore have an impact on later samples. This might lead to inaccurate estimates of sampling probabilities which, in turn, could influence the estimates [20]. Further, random recruitment of peers is impossible to enforce and measure among respondents and in many cases selection of peers are based on special reasons [21,23]. Selective recruitment could result in biased estimates, for instance if network degree or certain variables are associated with recruitment [20,23].
Second, there are limitations to the study design that could have influenced the successfulness and speed of recruitment. As there is no consensus on the definition of precarious employment and as this population potentially combines several activities and/or employment types, the eligibility criteria used in terms of employment in this study might have been too inflexible. This might have caused eligible participants to be excluded and caused some recruitment chains to die out earlier than necessary. The need to indicate a personal identification number, which is deemed sensitive data, could have limited participation further. In addition, we did not include participants with a so-called co-ordination number, which is given to a foreigner residing in Sweden without a personal identification number, meaning that the study failed to include this group of the precarious worker population. Further, as the study employed an online design it was not possible to completely ensure that participants did not participate repeated times. Incorrect exclusions would result in less power and considerably fewer waves in one recruitment tree than necessary. Finally, it is possible that receiving one reward per recruited peer would have benefited the recruitment as previously done in RDS studies recruiting large samples [27,35,36].

Conclusion
WebRDS satisfactorily recruited a sample of workers with precarious employment that became independent from seeds and from which population estimates could be calculated. The online peer recruitment process produced a diverse sample with a better response rate compared to previous national surveys. Further, previously underrepresented demographic subgroups such as young and males participated to a sufficient extent. Future studies should consider implementing the WebRDS methodology in this population on a national level in order to further our understanding of workers with precarious employment.
Supporting information S1 Table. Recruitment homophily estimates for eligible recruits and eligible and ineligible recruits. (PDF) S1 File. Data collection instrument in English and Swedish (original language). (PDF) S2 File. Information given to participants and informed consent. (PDF)