The authors have declared that no competing interests exist.
Conceived and designed the experiments: JG RC PE SP PB. Performed the experiments: JG CM GJ. Analyzed the data: JG CM. Contributed reagents/materials/analysis tools: CM GJ RL. Wrote the paper: JG RC PE SP PB CM GJ RL.
Gene-environment interaction studies offer the prospect of robust causal inference through both gene identification and instrumental variable approaches. As such they are a major and much needed development. However, conducting these studies using traditional methods, which require direct participant contact, is resource intensive. The ability to conduct gene-environment interaction studies remotely would reduce costs and increase capacity.
To develop a platform for the remote conduct of gene-environment interaction studies.
A random sample of 15,000 men and women aged 50+ years and living in Cardiff, South Wales, of whom 6,012 were estimated to have internet connectivity, were mailed inviting them to visit a web-site to join a study of successful ageing. Online consent was obtained for questionnaire completion, cognitive testing, re-contact, record linkage and genotyping. Cognitive testing was conducted using the Cardiff Cognitive Battery. Bio-sampling was randomised to blood spot, buccal cell or no request.
A heterogeneous sample of 663 (4.5% of mailed sample and 11% of internet connected sample) men and women (47% female) aged 50–87 years (median = 61 yrs) from diverse backgrounds (representing the full range of deprivation scores) was recruited. Bio-samples were donated by 70% of those agreeing to do so. Self report questionnaires and cognitive tests showed comparable distributions to those collected using face-to-face methods. Record linkage was achieved for 99.9% of participants.
This study has demonstrated that remote methods are suitable for the conduct of gene-environment interaction studies. Up-scaling these methods provides the opportunity to increase capacity for large-scale gene-environment interaction studies.
Genetic epidemiology is a dynamic science with a fast changing knowledge base and technology base which has moved beyond the genome to the investigation of gene-environment interactions (GxE).
For complex (non-Mendelian) disease, GxE studies are dependent on the recruitment of large numbers of individuals in the pursuit of small effect sizes. They also require new data collections with increasingly diverse and detailed phenotyping. With increasing awareness of the genetic and epigenetic complexity underlying disease, the importance of GxE studies grows. Conventional epidemiologic methodology, involving direct (face-to-face) contact with participants, is highly resource intensive which limits the opportunity for new GxE studies. For the full benefits of GxE studies to be realised, therefore, new methods are required. Although large infrastructures have been proposed as one means of increasing cost-effectiveness,
The online environment presents an, as yet, underexploited opportunity to conduct GxE studies remotely, that is, without direct participant contact, offering the potential to significantly reduce costs. The online environment is also flexible and can be rapidly responsive to emerging hypotheses. For reasons of privacy and convenience the online environment may be preferable to participants.
Moreover, for assessments of episodic outcomes, the online environment may be a more sensitive medium than a clinic assessment. Current technological limitations would preclude the testing of some hypotheses, particularly those involving the specialist preparation of bio-samples or direct measurement of performance. However, a growing number of hypotheses could be rigorously tested entirely remotely. These would include genetic and epigenetic hypotheses. Cognitive performance related hypotheses are particularly suitable for testing remotely.
The prospect of conducting epidemiologic studies remotely was first mooted by Rothman who proposed the internet as a suitable vehicle for this purpose,
In this paper we report the field test of a platform designed to conduct GxE studies entirely remotely. The platform is an adaptation of the methods and thinking underlying UK Biobank which provided a benchmark for recruitment, consent and follow-up procedures which would be suitable for large scale application
This study received ethical approval from the South East Wales Research Ethics Committee.
A random sample of 15,000 men and women aged 50+ years and living in Cardiff UK, was selected from the National Health Service Administrative Register. Of these, Welsh Assembly Government figures, derived from the 2007 Living in Wales Survey of households, suggest that 6,011 (40%) were connected domestically to the internet at the time of the study
Participants were mailed an invitation letter and participant information leaflet on a single occasion only. The invitation was to participate in the ‘Age Well, Feel Good’ study of successful ageing. The mailing was conducted by the NHS on behalf of the research team to preserve the anonymity of participants. The identity of participants only became known to the research team when given by participants upon completion of the on-line consent procedure. Each invitation letter included a link to the study web-site. Embedded within the link was a universal resource locator (URL) used as a participant specific identifier. This allowed the browsing behaviour of individual participants to be analysed.
To provide participant support within a remote study paradigm, a free-phone helpline was made available. Calls were received by the Cardiff University Participant Resource Centre (PRC), a science dedicated call and mailing centre based at Cardiff University. Calls were handled by specially trained operators using scripts prepared by the research team. Escalation procedures allowed participants to speak to the study’s principal investigator if required. The PRC was operated to commercial standards with all calls being recorded for the purposes of quality control and quality assurance, and dealing with complaints.
The website had four design constraints. First, for reasons of security, it had to be entirely under the control of the research team. For this reason all code was written by the research team (CM). Second, it had to be flexible and adaptable for use in a wide range of epidemiological studies. To achieve this, a modular architecture was used enabling the easy change of content. For this particular study the assessment was organised into 8 themes comprising a total of 22 measurement modules. The themes covered demographics, health, cognitive performance, psychological state, social support, leisure time activity, diet and the built environment. Third it had to be functional on a wide range of web-browsers and computer platforms as well as on dial-up and broadband connections. To achieve this we avoided the use of video, Flash, and commercial web-authoring tools. We also adhered to current web standards (HTML4, XML1.0). Fourth, it had to engage the target population as inclusively as possible, in this case middle-aged and older people with either English or Welsh as their first language. It was assumed that the target population would be computer literate but not computer sophisticated and may have slight visual impairment. For these reasons a simple colour palette and large font-size were used and all pages were designed to fit onto the screen without the need for scrolling. The site was multi-lingual.
The link within the invitation letter took participants to the study homepage with links to background information to the study and frequently asked questions. The background information process led to an opportunity to consent to participate in the study. The background information included a brief description of the study requirements, confidentiality and withdrawal procedures. Consent was requested for asking questions, following health related records, re-contact and donating a bio-sample for genetic and biochemical analysis. Once consent had been given, the participant’s name and contact details were requested and once these were given a study membership number was issued.
The website was organised into 8 themes which could be accessed in any order. These themes were designed to have face-validity to participants. Within each theme there was a sequence of modules, each covering a more specific area of measurement. Participants did not have to complete all 22 modules in a single session and could use their study membership number to return to the site repeatedly, however, each module had to be completed at a single session. The content was designed to cover a range of variables generally of interest to the epidemiology of ageing. Items were also included on health service delivery evaluation to investigate the suitability of this medium for fungible studies.
The impact of remote bio-sampling for genetic determination on participation in a remote study is unknown. For this reason, nested within this study was a randomised trial comparing participation between requesting a dry-blood sample, requesting a buccal cell sample, and not requesting a bio-sample. Due to participant identities not being known to the research team until the study was joined, randomisation was conducted at the point of first contact with 5,000 participants being allocated to each arm of the trial. For participants who were invited to donate a bio-sample, the appropriate bio-sampling kit was mailed to them once they had joined the study. Due to supply difficulties, bio-sampling kits were not mailed to participants until the fourth month of the study.
Follow-up was by establishing linkage with the National Health Service Administrative Register. Follow-up through re-contact has not yet been attempted.
Age was recorded in years and grouped into 10 year bands. The Welsh Index of Multiple Deprivation score for participants for the LSOA of their address as a quintile scores based on the ranking of all LSOAs in Wales, with 1 describing least deprived LSOAs and 5 describing most deprived LSOAs. Comparison of means between clinic and web administered cognitive tests and wellbeing scores were made by t-test. Differences between means were detected at p<0.05.
The PRC received 200 calls covering a range of topics. Most were requests for further details of the study and confirming the study’s bona-fides. There were, however, 7 (0.05%) complaints at being invited to participate.
The use of individual URLs enabled the passage through the consent procedure to be analysed. Altogether 92 persons visited the site and did not join the study. Drop-outs occurred at pages giving the study overview (35%), at the point of consent (37%) and at the point of giving personal details (11%). All other drop-outs were evenly spread between pages describing bio-sampling, record linkage, withdrawal or re-contact (17%). However, not all participants joined the study using their personal URL. This may have been for a variety of reasons, including firewall issues or using search engines to check out the study on the web prior to joining, and then joining directly from the homepage. It may also be that some participants had heard about the study virally rather than by invitation letter.
After 22 weeks 663 participants had joined the study. Sampling bias was evaluated in terms of age, sex and social deprivation (
Variable | Response | P value(2 sided) | ||
invited | consented | |||
Age (years) | 50–59 | 5,251 (35%) | 267 (40%) | <0.001 |
60–69 | 4,524 (30%) | 280 (42%) | ||
70–79 | 3,020 (20%) | 85 (13%) | ||
80–89 | 1,847 (12%) | 29 (4%) | ||
90+ | 358 (3%) | 2 (0.5%) | ||
Gender | female | 7,758 (52%) | 330 (49.8%) | 0.33 |
male | 7,242 (48%) | 333 (50.2%) | ||
Deprivation |
1 (least deprived) | 7,156 (48%) | 441 (67%) | <0.001 |
2 | 1,646 (11%) | 75 (11%) | ||
3 | 1,514 (10%) | 43 (7%) | ||
4 | 1,886 (12%) | 56 (8%) | ||
5 (most deprived) | 2,798 (19%) | 48 (7%) |
Welsh Index of Multiple Deprivation.
Of the 663 participants, 576 (87%) joined within 4 weeks of the mailing and 636 (96%) within 8 weeks of the mailing (
Of the 663 participants who joined the study, 642 provided data. For the 642 who provided data completion rates varied between modules from 99% for demographic details to around 85% for health service evaluation (
Themes | Modules (items) | Variables | Completion rates |
Your circumstances | Demographics (16) | Marital status, etc. | 97–99% |
Sleep (7) | Sleeping habits | 95% | |
Health | General health questions (35) | Perceived health, disability, ADL | 98% |
Major health questions (45) | Doctor diagnosed illness | 94% | |
Sight and hearing (9) | Difficulties affecting lifestyle | 93% | |
Dental (15) | Photo identification of dental illness | 90–93% | |
Thinking | Mood (14) | HADS |
95% |
Fluid intelligence (12) | Numeric and verbal reasoning | 86% | |
Reaction time (60) | Two choice | 90% | |
Episodic memory (12) | Paired associates learning | 86% | |
Working memory (1–12) | Forward digit recall | 87% | |
Attention (30) | Stroop non interference reaction time | 84% | |
Attention (30) | Stroop interference reaction time | 84% | |
Feelings | Wellbeing (22) | Life satisfaction, Self esteem, Self efficacy | 90–92% |
People | Social support (19) | Emotional and practical support | 91–94% |
Leisure time | Leisure activity (33) | Physical and sedentary activity | 93% |
Smoking (16) | Current and past smoking behaviour | 91% | |
Diet (18) | Food frequency questionnaire | 93% | |
Alcohol (12) | Frequency and type of consumption | 91% | |
Place | Perceived built environment (19) | General neighbourhood quality | 88% |
Observed built environment (24) | Observed street quality from front door | 86% | |
Health and social care | Service evaluation (11) | GP, hospital, pharmacy and dental | 82–86% |
Range of completion given when completion varied within module according to item.
Hospital Anxiety Depression Scale.
Web based | Clinic based | P | ||||||
Domain | Variable | Age Well Feel Good Study | Airwave Study) | Caerphilly Study) | For difference between mean values (2 sided) | |||
(n = 540 |
(n = 9,234) | (n = 964) | ||||||
Mean (SD) | Cronbach’s α | Mean (SD) | Cronbach’s α | Mean (SD) | Cronbach’s α | |||
CognitiveFunction | Fluid intelligence score | 5.03 (2.08) | – | 4.96 (1.85) | – | – | – | 0.4 |
Working memory score | 7.79 (2.09) | – | 7.96 (1.26) | – | – | – | 0.07 | |
Two choice reaction time (mSec) | 482 (106) | – | 538 (107) | – | – | – | <0.001 | |
Stroop Interference effect (mSec) | 460 (356) | – | 336 (703) | – | – | – | <0.001 | |
Wellbeing | Life satisfaction score | 25.9 (6.2) | 0.89 | – | – | 26.3 (6.3) | 0.87 | 0.2 |
Self Efficacy score | 26.5 (4.6) | 0.87 | – | – | 25.1 (5.0) | 0.85 | <0.001 | |
Self esteem score | 45.2 (8.1) | 0.85 | – | – | 44.7 (8.7) | 0.92 | 0.08 |
The sample size varies according to analysis between 540 and 594.
Comparison wellbeing data were available for 964 men aged 65–80 years from the Caerphilly Prospective Study which also used clinic-based assessment. Web-based mean self-efficacy score was slightly higher (1.5 points, p<0.001) (
The trial of the impact of making a bio-sample request on recruitment cannot be analysed strictly on an intention-to-treat basis as, necessarily, randomisation occurred at invitation rather than post-consent. Of the 549 participants for whom randomisation status was known, 196 (36%) were respondents who were not asked to provide a bio-sample; 182 (33%) were respondents who were asked to donate a buccal cell sample of whom 136 (75%) did so; whilst 171 (30%) were respondents who were asked to donate a dry blood sample, of whom 119 (70%) did so.
Linkage was achieved for 662/663 (99.9%) of participants.
The utility of a platform for the remote conduct of GxE studies has been demonstrated. In a representative sample of older people, 11% of those estimated to be connected to the internet consented to participate of whom 99.9% provided data. A randomised trial nested within the study found that the request of a bio-sample had little effect on participation. The donation rate of those who agreed to provide a bio-sample was over 70%.
There were several challenges to overcome before ethical approval for this study was obtained. These were largely due to the combination of technologies that were being proposed. The major issue was the linking of genetic information with clinical records. The commitment to use a fully secure and de-identified database for linkage and subsequent analyses was considered to provide adequate protection for participants.
The extremely low rate of complaint (0.05%) strongly confirmed the evidence of previously conducted qualitative studies that, in principle, remote methods are acceptable to the public.
The response rates achieved here are difficult to assess accurately as it was not known beforehand which invitees had internet access. Based on Government figures it was likely that 6,011 invitees were internet connected giving a response rate of 11%. The Government figures were based on a representative sample of 7,728 households throughout Wales (reflecting a 71% response rate) surveyed in 2007. In our study, in terms of the mailed sample 4.5% responded. Given that this is an older population with limited internet connectivity, this response may be considered comparable to the 5.4% achieved in UK Biobank.
Remote methods, in which recruitment costs are minimised and participation restricted by computer access, bring the issue of selection bias into sharp focus. A helpful distinction is between descriptive and etiologic studies. The former describe specific populations. For descriptive studies to achieve unbiased estimates of prevalence, incidence or normal ranges, representative samples are required. Etiologic studies investigate mechanisms that occur across populations. For these studies heterogeneous population samples are required so that the range of values for an exposure is available to the analysis. Also required is the non-differential ascertainment of incident outcomes. GxE studies are not designed to describe specific populations. As such, response rates affect cost rather than bias. Similarly, remote methods are not generally suitable for descriptive studies but for testing etiologic hypotheses. Our study has demonstrated, in terms of age, sex and deprivation, that heterogeneity can be achieved using remote methods.
Heterogeneity in this study was achieved by dint of numbers rather than by a systematic method, such as random sampling. It is unlikely that the heterogeneity available to the analysis would have been materially affected had we used a different recruitment method, such as a media campaign, provided we recruited sufficient numbers.
Rather than requiring all studies to be representative, a preferred strategy is to identify mechanisms using etiologic studies and then apply that knowledge to specific populations. Clarifying and separating these goals enables more efficient study design, as etiologic studies may be conducted without the unnecessary burden (and cost) of having to achieve representativeness, and descriptive studies may be conducted without the unnecessary burden (and cost) of having to achieve large sample size. By separating these goals each design can be prosecuted more efficiently.
A further issue is the validity and reliability of web-based assessment. Evidence largely supports comparability between measurement media for questionnaires.
Requesting a bio-sample appeared to have only a small effect on participation. It appears that if the rationale for the study is persuasive, the donation of genetic material is not problematic. Furthermore, although the donation of dried blood was not a painless exercise, this also was widely acceptable. The actual donation rates (70–75% according to sample), although useful for planning purposes, are likely to be conservative due to the passage of several months between most participants joining the study and being mailed the sampling kit.
Etiologic studies also require non-differential ascertainment of outcomes. In practice, this means very high follow-up rates. Here the principal follow-up method was by record linkage. The high level of linkage achieved may not be surprising given the initial invitations were based on the National Health Service Administrative Register database. However, although follow-up by electronic linkage may virtually eliminate attrition, for many hypotheses e.g. those involving inadequate routine measurement of outcomes such as common mental disorder, or those involving change over time such as cognitive decline, follow-up by re-contact is required.
This study cost around £100 per participant. This mostly involved IT development costs, reflecting the cost structure of remote studies being front-loaded compared to traditional methods. For larger studies, on the basis of subsequent cost being due largely to mailing and bio-sampling, we crudely estimate, on the basis of a 10% response rate, and a 70% bio-sample donation rate, that recruitment and bio-sampling for a GxE study of 50,000 would cost around £15 per participant over an 18 month period. These
Many limitations remain to be overcome before remote methods are as clearly understood and accepted as face-to-face methods. Although we have achieved linkage we have not downloaded data, but the system proposed for this is currently being used in a large e-cohort of births in Wales.
GxE studies offer the prospect of robust causal inference through both gene identification and instrumental variable approaches.