Assessing construct reliability through open-ended survey response analysis

Katherine E. Koralesky; Marina A.G. von Keyserlingk; Daniel M. Weary

doi:10.1371/journal.pone.0320570

Abstract

Online surveys often include quantitative attention checks, but inattentive participants might also be identified using their qualitative responses. We used the software Turnitin™ to assess the originality of open-ended responses in four mixed-method surveys that included validated multi-item rating scales (i.e., constructs). Across surveys, 18-35% of participants (n = 3,771) were identified as having copied responses from online sources. We assessed indicator reliability and internal consistency reliability and found that both were lower for participants identified as using copied text versus those who wrote more original responses. Those who provided more original responses also provided more consistent responses to the validated scales, suggesting that these participants were more attentive. We conclude that this process can be used to screen open-ended responses from online surveys. We encourage future research to replicate this screening process using similar tools, investigate strategies to reduce copying behaviour, and explore the motivation of participants to search for information online, including what sources they find compelling.

Citation: Koralesky KE, von Keyserlingk MA, Weary DM (2025) Assessing construct reliability through open-ended survey response analysis. PLoS ONE 20(4): e0320570. https://doi.org/10.1371/journal.pone.0320570

Editor: Nicolas Jacquemet, Paris School of Economics, FRANCE

Received: October 28, 2024; Accepted: February 20, 2025; Published: April 1, 2025

Copyright: © 2025 Koralesky et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All code and data can be found in the Supplemental Materials (https://doi.org/10.5683/SP3/PEPATK).

Funding: This work was funded by the Government of Canada through Genome Canada and the Ontario Genomics Institute (OGI-191). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: I have read the journal's policy and the authors of this manuscript have the following competing interests: Semex (Guelph, Ontario, CA) and Acceligen (Eagan, Minnesota, USA) are industry partners to the Genome Canada grant. They did not provide any input on the research or writing of this paper. This does not alter our adherence to PLOS ONE policies on sharing data and materials.

Introduction

Online surveys are commonly used by researchers to examine public views on a variety of topics. As the use of online surveys has grown, so has the research on survey data quality [1,2–4]. For example, research on the advantages and drawbacks of using online platforms has led to the development of recommendations [5,6] and a call for the use of reporting standards [7]. Assessing participant “attention” (or “inattention”) is a common recommendation, and disciplines use terms such as “careless responding”, “inefficient effort responding” or “random responding” to characterize this behaviour [4]; we use the term “attentiveness” in this paper. Much of this research focuses on Amazon Mechanical Turk (MTurk), likely because it is one of the most frequently used platforms [5,8]. MTurk provides a large participant pool, reasonable cost, flexibility regarding research design, accessibility and efficient data collection [5], and can provide quality data [9,10]. However, MTurk is also susceptible to inattentive workers, misrepresentation by workers, workers with inconsistent English language fluency, and vulnerability to code used to automatically complete surveys (i.e., bots) [2,5,11]. Researchers have investigated these concerns and confirmed that the MTurk Approval Rating system is not an effective measure for improving data quality [12] and that MTurk data quality has decreased over time [13]. Platforms like CloudResearch and Prolific can generate higher quality data [14], but are subject to the same challenges.

Researchers have developed specific strategies to improve data quality [2,5,15], including measures of survey completion time as an indirect way to identify inattentive participants [16,17] and bot detection [18]. Attention and manipulation checks are commonly used to identify whether participants have read and understood the questions; these can include asking participants to respond to logical statements and directed queries [19], instructed manipulation checks (i.e., where participants are asked to select a specific response on a scale [20]; and others [21].

Attention checks can improve data quality but also have drawbacks [2,21,22]; for example, checks are sensitive to participant learning, vary in terms of difficulty [2] and can frustrate [19], as well as distract participants and interrupt survey flow [23]. Also, participant attentiveness can vary across survey topics, attention check questions, and within participants at different points in time [21]. Some research has found that attention checks can bias samples based on participant education, age and gender [4]. Attentiveness is likely not a stable participant characteristic, but rather depends on survey context, and some have suggested including multiple attention check questions and stratifying results by attention check response to avoid sampling biases [4].

Analysis of open-ended qualitative responses provides another opportunity for improving data quality. Open-ended responses are common in mixed-method surveys, where researchers typically collect and analyze quantitative and qualitative data [24]. Screening of qualitative responses has been suggested as an additional attention check strategy [5]. Common screening strategies include removing participants who provide responses that cannot be meaningfully interpreted or are nonsensical [13,25]. Additionally, participants who fail to meet criteria for response length or discuss a small number of themes in a response, with fewer themes mentioned indicating a lack of familiarity with the topic, can be removed [26]. Such screening is usually done manually and thus requires additional time to review individual survey responses, as well as researcher assessment which involves subjectivity [19].

We screened qualitative data from two of our studies using the strategies above [27,28], but noticed some similarities across responses. We pasted several responses into the search engine Google and found they were copied word-for-word from online sources. How often participants use online sources to answer MTurk survey questions is unknown [11], and few studies to date have investigated this behaviour. For example [29], analyzed the frequency of participants who self-reported looking for answers to factual political knowledge questions (e.g., current USA unemployment rate) and tested interventions to decrease this behaviour. In another instance [30], detected “search behaviour” by asking participants a very difficult political knowledge question and considered correct answers indicative of search behaviour. Next [31], confirmed that participants commonly looked online for answers to political knowledge questions, and discussed how this behaviour harms survey validity and can lead to misrepresentations of public knowledge. Using a different approach [32], developed TaskMaster, a Javascript tool that can be integrated into Qualtrics surveys and monitors when participants leave the survey window; this tool has been used to confirm self-reported search behaviour on survey questions [12].

The aim of this paper is to describe a novel screening process for open-ended responses in online surveys. We used the online writing tool Turnitin™ [33] to assess the originality of open-ended responses collected in four different public surveys about genetic engineering in farm animals. Turnitin™ is available in many universities, easy to use and detects text that has been copied from online sources. We first describe the frequency and percentage of participants who were identified by Turnitin™ as providing responses copied from online sources, and use a two-sample t-test to compare survey completion time for participants who copied versus those who wrote more original responses. We then describe indicator reliability and internal consistency reliability (using Cronbach’s alpha) of multi-item rating scales for participants who provided copied versus more original responses, and statistically compare Cronbach’s alpha values. We report indicator reliability and internal consistency reliability as our main outcomes to demonstrate that participants who provided copied responses responded less consistently to multi-item rating scales than those who provided more original responses. We conclude with a discussion about the motivation of participants to search for information online when responding to surveys and limitations of the study.

Method

Ethics

We assessed our screening process using responses to four different surveys collected in two studies, both approved by the University of British Columbia (UBC) Behavioural Research Ethics Board (Study 1 # H21-00047 and Study 2 # H22-00873).

Participant recruitment

We created surveys in Qualtrics (Provo, UT, USA) and used MTurk to recruit participants from the USA who were 18 years of age or older (see Supplemental Materials for a copy of each survey; https://doi.org/10.5683/SP3/PEPATK). Participant quotas were set based on representative data from the United States Census Bureau [34]. We ran each survey at different times during 2021 and 2022, beginning on June 10, 2021 and ending on May 24, 2022. This recruitment resulted in a convenience sample of MTurk participants who were willing to take the survey when it was available. Participation was anonymous and confidential and each participant provided written consent to take the survey. We compensated participants $1 USD.

Survey design and measures

This study uses open-ended qualitative data from four different surveys. In Study 1, we conducted three separate surveys (henceforth referred to as Study 1, Surveys 1, 2 and 3) and asked participants to describe their understanding of genetic engineering technology terms (e.g., gene editing, genetic modification; Surveys 1, 2) and the perceived acceptability of genetic engineering in farm animals (Survey 3; [27]; see Supplemental Materials; https://doi.org/10.5683/SP3/PEPATK). In Study 2, we conducted one survey (henceforth referred to as Study 2, Survey 4) soliciting top-of-mind public questions about gene editing in farm animals ([28]; see Supplemental Materials; https://doi.org/10.5683/SP3/PEPATK).

Each survey also included validated multi-item rating scales that typically achieve high levels of internal consistency and are used to create a construct (e.g., trust). In Study 1 [27] we used a causal trust-acceptability model involving four constructs [35,36] previously used to assess social acceptance of novel technologies [e.g., 36,37]. Each construct includes individual indicator variables, including the construct of acceptance (4 variables), benefit (3 variables), risk (3 variables), and social trust (5 variables). In Study 2 [28], we used several scales: the Aversion to Tampering with Nature (ATN, which includes five variables; [38]), benefit, risk and attitude scales (each including 3 variables; [39] and a social trust in institutions scale (4 variables; [35].

Data preparation and analysis

Each survey used the same three data quality screening procedures (full details are provided in each study), removing: 1) incomplete responses, 2) responses from participants who failed the instructed manipulation check (participants were asked to select a specific response on a scale) [20], and 3) responses from participants who did not respond to the qualitative question or who provided non-sensical text (e.g., “nice animals”, “engineered animals are numerals”; [26]).

To systematically determine if responses were copied, we submitted all remaining open-ended responses to Turnitin™ using the Quick Submit function. We selected search repositories available through UBC’s Turnitin™ license, including the internet, student papers and periodicals, journals and publications. We found that most responses were copied nearly word-for-word from online sources, indicated by Turnitin™ highlighting the entire response; categorizing these responses as copied was straightforward. However, some responses had only a portion of the text highlighted, for example when phrases like “the genome of an organism” or “DNA being modified or edited” were used. For responses with a portion of text highlighted, we categorized those with more than 50% of the content identified as similar to online sources as “mostly copied” (for ease, henceforth referred to as “copied”), and responses with less than 50% of the content identified as similar to online sources as “not mostly copied” (for ease, henceforth referred to as “not copied”).

We could not conduct a-priori power analyses because we were unaware participants would perform search behaviour and look for information online. In this study, we analyzed data descriptively and used different procedures to assess statistical differences, and report effect size where possible. First, in each group (copied and not copied) for each of the four surveys, we categorized participant demographic information and performed two-sample t-tests to compare mean survey completion times for copied and not copied groups and calculated effect size using Cohen’s d. We then assessed the reliability of constructs by determining indicator reliability and internal consistency reliability [40]. Indicator reliability measures the correlation between a construct and the individual indicator variables (i.e., questions). To assess indicator reliability, we used PROC CALIS in SAS ® OnDemand for Academics (version 9.4, SAS Institute Inc.) to assess model fit for each construct and the corresponding indicator variables using a measurement model. We measured indicator reliability for each construct in each group (copied and not copied), for each of the four surveys. This procedure generates a t-value for each indicator variable to validate the relationship between the indicator variable and construct; significant t-values confirm the validity of the measurement model. Indicator reliability estimates above 0.7 are recommended, loadings between 0.4 and 0.7 should be considered for removal, and loadings below 0.4 should be removed [40].

To assess internal consistency reliability, we used Cronbach’s alpha. Internal consistency reliability measures the degree of reliability (i.e., interrelatedness of indicator variables) for each construct [40]. We measured internal consistency reliability for each group (copied and not copied), for each of the four surveys. Cronbach’s alpha can range from 0 to 1, with values from 0.7 to 0.95 generally considered acceptable although the value is also a function of the number of items in a scale [41]. We statistically compared Cronbach’s alpha values using the cocron web interface [42], a platform-independent R package that estimates a χ2 distribution based on the hypothesis that values are statistically different.

Results

Our first aim was to describe how often participants provided copied responses. The four surveys considered a total of 3,771 participants, 837 of whom were identified as providing responses copied from online sources. Across surveys, 18–35% of participants provided content identified as at least 50% copied: Study 1, Survey 1: 18%; Study 1, Survey 2: 18%; Study 1, Survey 3: 24%; Study 2, Survey 4: 35%. Table 1 provides the exact open-ended question asked and demographic information about the participants in each survey, separately for those who provided content identified as mostly copied versus those who did not. Generation was categorized following [43] and ages were adjusted to reflect the data collection date: Generation Z includes participants ≤ 24 years of age, Millennial includes participants aged 25–40 and Pre-millennial includes participants aged ≥ 41 years of age.

Download:

Table 1. Descriptive statistics about survey participants in each survey who provided responses identified as copied, or not identified as copied and the total number of responses and frequencies for each demographic, including Generation, Gender and Education.

https://doi.org/10.1371/journal.pone.0320570.t001

Using two-sample t-tests, we found no difference in completion times between participants identified as having copied responses versus those not identified as having copied responses for Surveys 1 and 3 in Study 1: Mean ± SE completion times were 5 min 27 s ± 22 s vs. 5 min 18 s ± 12 s (t₁₁₉₃ = −0.31, p = 0.75; d = 0.02), and 5 min 8 s ± 17 s vs. 5 min 37 s ± 10 s (t₁₁₂₆ = 1.42, p = 0.16; d = 0.10) for copied and not copied responses in these two surveys, respectively. In the other two surveys we found differences but these were inconsistent; participants who copied took longer to complete Survey 2 in Study 1: 5 min 19 s ± 23 s vs. 4 min 26 s ± 9 s (t₉₃₅ = −2.52, p = 0.01; d = 0.21). In Study 2, those who copied completed the survey more quickly than those who did not: 6 min 24 s ± 19 s vs. 8 min 5 s ± 22 s (t₅₂₆ = 2.96, p = 0.003; d = 0.27).

Our second aim was to describe indicator reliability and internal consistency reliability of each construct in each survey for participants who provided copied versus more original responses. In Study 1, Surveys 1, 2, and 3, for the constructs of Acceptance, Benefit and Risk, each indicator reliability estimate for the participants who copied had weaker associations compared to the participants who provided more original responses (Table 2). For Social Trust, in Study 1, Survey 1 and 2, estimates were stronger for each indicator in the not copied group with the exception of T5 in Surveys 1 and 2, and T3, T4 and T5 in Survey 3. In those cases, indicator estimates were stronger for the participants who copied versus those who did not copy. In Study 2, Survey 4, all indicators for participants who copied fell within the threshold to be considered for removal (i.e., 0.4–0.7); indicators for those who provided more original responses were above the threshold for acceptance ( > 0.7; Table 3).

Download:

Table 2. Standardized indicator reliability estimates for Study 1, Survey 1, 2 and 3 respectively (S1–S3). Estimates are shown for each indicator for each of four constructs examined (Acceptance, Benefit, Risk, Social Trust), and for participants who provided responses identified as copied and those whose responses were not identified as copied. All indicator reliability estimates are significant (p < 0.001).

https://doi.org/10.1371/journal.pone.0320570.t002

Download:

Table 3. Standardized indicator reliability estimates for Study 2, Survey 4. Estimates are shown for each indicator for each of five constructs examined (Attitude, Benefit, Risk, Social Trust, ATN), and for participants who provided responses identified as copied and those whose responses were not identified as copied. All indicator reliability estimates are significant (p < 0.001).

https://doi.org/10.1371/journal.pone.0320570.t003

The internal consistency reliability of all constructs was lower for participants who copied responses to open-ended questions versus those who provided text that was not copied (Tables 4, 5). This finding held across all four surveys included in the current study. We compared Cronbach’s alpha values and found that all but two were statistically different between the copied and not copied respondents. Consistent with the indicator reliability estimates reported above, there was no statistical difference for the Social Trust construct in Study 1, Survey 2 and Survey 3 between the copied and not copied groups.

Download:

Table 4. Cronbach’s alpha and 99% confidence intervals for Study 1, Surveys 1, 2 and 3. Values and confidence intervals are shown separately for each of four constructs examined (Acceptance, Benefit, Risk and Social Trust), and for participants who provided responses identified as copied and those whose responses were not identified as copied. For each of the constructs Cronbach’s alpha was significantly higher (p < 0.01) for participants who provided the more original responses, with the exception of the Social Trust construct for Surveys 2 and 3.

https://doi.org/10.1371/journal.pone.0320570.t004

Download:

Table 5. Cronbach’s alpha and 99% confidence intervals for Study 2, Survey 4. Values and confidence intervals are shown separately for each of five constructs examined (Attitude, Benefit, Risk, Social Trust, and ATN), and for participants who provided responses identified as copied and those whose responses were not identified as copied. The indicator AT2 was removed due to poor correlation with the ATN construct. For each of the constructs Cronbach’s alpha was significantly higher (p < 0.01) for participants who provided the more original responses.

https://doi.org/10.1371/journal.pone.0320570.t005

Discussion

We assessed responses of 3,771 participants, recruited to participate in four different surveys and responding to quantitative scales used to develop nine different constructs, and found that most indicators and all constructs were less reliable for participants who provided text identified as copied than participants who provided more original responses. In all but two comparisons, Cronbach’s alpha values were significantly higher when considering only participants who provided more original responses. It is important to note that our analysis considered data from participants who had passed the instructed manipulation check [20] in the surveys and who wrote coherent responses. We found no consistent evidence that survey completion time varied in relation to the provision of copied text; this result agrees with previous work suggesting that survey completion time is not sensitive or specific for identifying participants who may be less engaged with the survey [4]. Research has found that most online survey participants tend to pass attention check questions, and thus there may be a need for more nuanced data quality control strategies [1,44]. Our screening process can provide nuanced data quality control and responds to calls to incorporate open-ended response analysis to screen data [5].

We found that Cronbach’s alpha values were higher for participants who provided more original responses. Inattentive participants often do not answer validated scales consistently [15]. We considered that the responses of participants who copied were of lower quality than those who wrote more original responses because our aim was to analyze public viewpoints. However, it is possible that for some participants copying was a thoughtful attempt to provide a well-researched response that in some cases may be more valuable than original but top-of-mind responses. Authors [29] found some evidence that participants displayed a form of socially desirable responding called self-deceptive enhancement, where, in the case of taking online surveys, people believe they know something because they know where to find the information online. In our surveys, participants may have misunderstood survey instructions, lacked familiarity with the survey format, or sought information about a topic (genetic engineering) for which they had limited technical knowledge and found it difficult to answer the questions. For some of these individuals, copying may have been their attempt to engage with the survey, rather than disengage. In future surveys, researchers might consider using strategies that have been shown to reduce online searching, including simply asking participants to avoid online searches [29,30].

We found it surprising that participants copied responses even though we specifically asked that they provide their views and top-of-mind responses and questions. It is not clear why participants would choose to copy answers from online sources; time spent looking for answers (and not completing additional surveys) is an opportunity cost for MTurk workers [29]. One hypothesis is that these participants found it easier to copy text than to compose their own response. This aligns with [45], who surveyed UK workers (i.e., people taking surveys) from three large online survey platforms, and reported workers felt ambivalent about completing tasks they described as “neither work nor leisure”. That said, participants who simply wanted to receive payment for the survey could input any response, even one-word or incoherent responses. We think it is unlikely these responses were from bots that automatically completed surveys. Another hypothesis is that these responses were from participants who took the time to search for related answers and copied this material. In this case these responses would reflect what sources people identify when searching for related information, and if there are common sources of information. These responses could potentially identify what arguments and evidence people find compelling. Future work might explicitly direct participants to search for materials that they find compelling or helpful in increasing their understanding of a topic, turning what otherwise might be considered as copying into data on the sources people turn to when considering substantive issues.

In our surveys, 18 to 35% of participants provided responses we flagged as copied. We are unsure as to why this range exists. MTurk data quality has decreased over time [13] and we conducted Study 2, Survey 4 (35% of participants provided copied responses) approximately one year after data collection for Study 1, Surveys 1, 2 and 3. All four surveys included multi-item rating scales and two open-ended questions each, and were approximately the same length. Others have reported similar rates of participants searching for answers to survey questions online [30,31]. Across 30 datasets, [19] reported average participant losses from different forms of attention checks: 14% (directed queries), 18% (logical statements) and 20% (manipulation checks). While removing these participants improved construct and scale fit [19], removing participants to improve data quality must be done with caution. Removal can result in loss of power (due to reduced sample size), or increased power (due to reduced variation within treatment); how these two factors vary will depend upon the specifics of the case. Further, removing participants can disproportionately exclude certain demographic characteristics (e.g., age, gender, education) and further induce bias [4]. We did not note any obvious relationship with participant age or gender, and time to complete the surveys, but there appeared to be some association with education level. Further work may wish to follow up on these results.

While our surveys focused on perspectives about the use of emerging biotechnologies like genetic engineering in animals, screening open-ended responses using Turnitin™ may be generalizable to other topics, for example research investigating public engagement with science and technology. This research has evolved over time [46], and scholars are now rethinking how to design engagement exercises that involve publics more deliberately, shifting away from a one-way flow of information from experts to members of the public [47]. Our screening process identified how some survey participants engaged with the topic on their own by searching for information online. Further research on a range of topics, both technical and familiar, is required to better understand the motivation and methods used for searching for online content.

Categorizing participants as “inattentive” or “careless” is likely oversimplified. Determining participant attentiveness is nuanced, leading to recommendations for transparent reporting of screening measures [4]. In addition to reporting screening measures in this paper, we avoided terms like “inattentive” to describe our sample, and instead described participants as those who copied and those who wrote original responses. Future work might assess construct reliability of participants removed using our screening process versus participants removed using common quantitative attention checks (e.g., logical statements, instructed manipulation checks). As well, our screening process could be used to determine if participants who fail common quantitative attention checks are more likely to provide copied responses.

Adding a survey question to assess participant knowledge may have helped identify participants who copied due to a lack of knowledge versus those who copied due to inattentiveness. Assessing objective or subjective knowledge about genetic engineering technology in public opinion surveys is common, but these assessments are predetermined by researchers and likely miss some types of knowledge participants have [28]. Asking participants what they want to know can provide information about public understanding on a topic, and which topics they want to know more about [28].

Our study has limitations. MTurk has a large participant pool [5], but we acknowledge that our samples were not representative. We found that all but two Cronbach’s alpha values were statistically different between the copied and not copied group, but the procedure we used does not report effect size [42]. We did not include additional qualifications for MTurk workers (e.g., a Human Intelligence Task Approval Rating), although this qualification is not always effective for controlling data quality [12]. While most responses were copied word-for-word, in cases where Turnitin™ highlighted part of the text as copied, we categorized responses as copied or not copied using a 50% threshold. Others may wish to classify responses in a different way or assess different thresholds. Turnitin™ may be unable to detect the use of artificial intelligence like ChatGPT to generate survey responses. We encourage other researchers to replicate our screening process to assess the originality of open-ended survey responses using other tools (e.g., iThenticate) and platforms (e.g., CloudResearch) to further develop this screening method.

Conclusion

We used Turnitin™ to categorize open-ended qualitative responses from four different surveys about genetic engineering. Most indicators and each construct in each survey were less reliable for participants who provided copied responses versus those who provided more original responses. We encourage researchers who use mixed methods or conduct surveys with open-ended questions to use this or similar tools to identify such participants, and to consider removing these from their dataset, or analyzing these responses separately. Future work should also investigate what sources of information are preferentially used by participants, and the reasons why people choose to copy rather than write original responses.

Acknowledgments

We thank Anjali Parthasarathy and Jillian Hendricks for their assistance with analysis. We thank Matthew Billet for his helpful comments and recommendation of additional data analysis techniques. We thank all survey participants. We also thank our colleagues at the University of Guelph (Guelph, Ontario, Canada) especially Dr. Michael von Massow and Jennifer Leslie, for their help and support throughout.

References

1. Chandler J, Rosenzweig C, Moss AJ, Robinson J, Litman L. Online panels in social science research: expanding sampling methods beyond mechanical turk. Behav Res Methods. 2019;51(5):2022–38. pmid:31512174
- View Article
- PubMed/NCBI
- Google Scholar
2. Hauser DJ, Paolacci G, Chandler J. Common concerns with MTurk as a participant pool: evidence and solutions. In: Kardes FR, Herr PM, Schwarz N, editors. Handbook of research methods in consumer psychology. New York: Routledge; 2019, 319–37.
3. Newman A, Bavik YL, Mount M, Shao B. Data collection via online platforms: challenges and recommendations for future research. Appl Psychol. 2021;70(3):1380–402.
- View Article
- Google Scholar
4. Berinsky AJ, Frydman A, Margolis MF, Sances MW, Valerio DC. Measuring attentiveness in self-administered surveys. Public Opin Quart. 2024;88(1):214–41.
- View Article
- Google Scholar
5. Aguinis H, Villamor I, Ramani RS. MTurk research: review and recommendations. J Manag. 2020;47(4):823–37.
- View Article
- Google Scholar
6. Evans JR, Mathur A. The value of online surveys: a look back and a look ahead. INTR. 2018;28(4):854–87.
- View Article
- Google Scholar
7. Turk T, Elhady MT, Rashed S, Abdelkhalek M, Nasef SA, Khallaf AM, et al. Quality of reporting web-based and non-web-based survey studies: what authors, reviewers and consumers should consider. PLoS One. 2018;13(6):e0194239. pmid:29912881
- View Article
- PubMed/NCBI
- Google Scholar
8. Peer E, Brandimarte L, Samat S, Acquisti A. Beyond the Turk: alternative platforms for crowdsourcing behavioral research. J Exp Soc Psychol. 2017;70153–63.
- View Article
- Google Scholar
9. Anson IG. Taking the time? Explaining effortful participation among low-cost online survey participants. Res Politics. 2018;5(3).
- View Article
- Google Scholar
10. Buhrmester M, Kwang T, Gosling SD. Amazon’s mechanical turk: a new source of inexpensive, yet high-quality, data? Perspect Psychol Sci. 2011;6(1):3–5. pmid:26162106
- View Article
- PubMed/NCBI
- Google Scholar
11. Chandler J, Paolacci G, Hauser DJ. Data quality issues on mechanical turk. In: Litman L, Robinson J, editors. Conducting online research on Amazon mechanical Turk and beyond. Thousand Oaks: Sage Academic Publishing; 2020, 95–120. https://doi.org/10.4135/9781506391151
12. Hauser DJ, Moss AJ, Rosenzweig C, Jaffe SN, Robinson J, Litman L. Evaluating CloudResearch’s Approved Group as a solution for problematic data quality on MTurk. Behav Res Methods. 2023;55(8):3953–64. pmid:36326997
- View Article
- PubMed/NCBI
- Google Scholar
13. Chmielewski M, Kucker SC. An MTurk crisis? Shifts in data quality and the impact on study results. Soc Psychol Personal Sci. 2019;11(4):464–73.
- View Article
- Google Scholar
14. Douglas BD, Ewell PJ, Brauer M. Data quality in online human-subjects research: comparisons between MTurk, Prolific, CloudResearch, Qualtrics, and SONA. PLoS One. 2023;18(3):e0279720. pmid:36917576
- View Article
- PubMed/NCBI
- Google Scholar
15. Ward MK, Meade AW. Dealing with careless responding in survey data: prevention, identification, and recommended best practices. Annu Rev Psychol. 2023;74:577–96. pmid:35973734
- View Article
- PubMed/NCBI
- Google Scholar
16. Meade AW, Craig SB. Identifying careless responses in survey data. Psychol Methods. 2012;17(3):437–55. pmid:22506584
- View Article
- PubMed/NCBI
- Google Scholar
17. Wood D, Harms PD, Lowman GH, DeSimone JA. Response speed and response consistency as mutually validating indicators of data quality in online samples. Soc Psychol Personal Sci. 2017;8(4):454–64.
- View Article
- Google Scholar
18. Griffin M, Martino RJ, LoSchiavo C, Comer-Carruthers C, Krause KD, Stults CB, et al. Ensuring survey research data integrity in the era of internet bots. Qual Quant. 2022;56(4):2841–52. pmid:34629553
- View Article
- PubMed/NCBI
- Google Scholar
19. Abbey JD, Meloy MG. Attention by design: using attention checks to detect inattentive respondents and improve data quality. J Oper Manag. 2017;53–56(1):63–70.
- View Article
- Google Scholar
20. Oppenheimer DM, Meyvis T, Davidenko N. Instructional manipulation checks: detecting satisficing to increase statistical power. J Exp Soc Psychol. 2009;45(4):867–72.
- View Article
- Google Scholar
21. Gummer T, Roßmann J, Silber H. Using instructed response items as attention checks in web surveys: properties and implementation. Sociol Methods Res. 2021;50(1):238–64.
- View Article
- Google Scholar
22. Hauser DJ, Ellsworth PC, Gonzalez R. Are Manipulation checks necessary? Front Psychol. 2018;9:998. pmid:29977213
- View Article
- PubMed/NCBI
- Google Scholar
23. Fowler C, Jiao J, Pitts M. Frustration and ennui among Amazon MTurk workers. Behav Res Methods. 2023;55(6):3009–25. pmid:36018485
- View Article
- PubMed/NCBI
- Google Scholar
24. Tashakkori A, Teddlie C. Integrating qualitative and quantitative approaches to research. In: Bickman L, Rog DJ, editors. The SAGE Handbook of Applied Social Research Methods. Thousand Oaks: Sage Publications, Inc.; 2009, 283–317.
25. Kunz T, Quoß F, Gummer T. Using placeholder text in narrative open-ended questions in web surveys. J Surv Stat Methodol. 2021;9(5):992–1012.
- View Article
- Google Scholar
26. Meitinger K, Behr D, Braun M. Using apples and oranges to judge quality? Selection of appropriate cross-national indicators of response quality in open-ended questions. Soc Sci Comput Rev. 2019;39(3):434–55.
- View Article
- Google Scholar
27. Koralesky KE, Sirovica LV, Hendricks J, Mills KE, von Keyserlingk MAG, Weary DM. Social acceptance of genetic engineering technology. PLoS One. 2023;18(8):e0290070. pmid:37585415
- View Article
- PubMed/NCBI
- Google Scholar
28. Kuo C, Koralesky KE, von Keyserlingk MAG, Weary DM. Gene editing in animals: What does the public want to know and what information do stakeholder organizations provide? Public Underst Sci. 2024;33(6):725–39. pmid:38326984
- View Article
- PubMed/NCBI
- Google Scholar
29. Clifford S, Jerit J. Cheating on political knowledge questions in online surveys. Public Opin Q. 2016;80(4):858–87.
- View Article
- Google Scholar
30. Motta MP, Callaghan TH, Smith B. Looking for answers: identifying search behavior and improving knowledge-based data quality in online surveys. Int J Public Opin Res. 2017; 29(4):575–603.
- View Article
- Google Scholar
31. Smith B, Clifford S, Jerit J. TRENDS: How internet search undermines the validity of political knowledge measures. Polit Res Q. 2019;73(1):141–55.
- View Article
- Google Scholar
32. Permut S, Fisher M, Oppenheimer DM. TaskMaster: a tool for determining when subjects are on task. Adv Methods Pract Psychol Sci. 2019;2(2):188–96.
- View Article
- Google Scholar
33. Turnitin™ [Internet]. Turnitin. n.d. [cited 2024 Oct 15. ]. Available from: https://turnitin.com/
- View Article
- Google Scholar
34. United States Census Bureau. American Community Survey: Age and Sex. 2019 [cited 2024 Oct 15. ]. Available from: https://data.census.gov/all?t=Age%20and%20Sex
- View Article
- Google Scholar
35. Siegrist M. The influence of trust and perceptions of risks and benefits on the acceptance of gene technology. Risk Anal. 2000;20(2):195–203. pmid:10859780
- View Article
- PubMed/NCBI
- Google Scholar
36. Bronfman NC, Jiménez RB, Arévalo PC, Cifuentes LA. Understanding social acceptance of electricity generation sources. Energy Policy. 2012;46246–52.
- View Article
- Google Scholar
37. Chen M-F, Li H-L. The consumer’s attitude toward genetically modified foods in Taiwan. Food Qual Prefer. 2007;18(4):662–74.
- View Article
- Google Scholar
38. Raimi KT, Wolske KS, Hart PS, Campbell-Arvai V. The aversion to tampering with Nature (ATN) scale: individual differences in (Dis)comfort with altering the natural world. Risk Anal. 2020;40(3):638–56. pmid:31613025
- View Article
- PubMed/NCBI
- Google Scholar
39. Bredahl L. Determinants of consumer attitudes and purchase intentions with regard to genetically modified food – results of a cross-national survey. J Consum Policy. 2001;24(1):23–61.
- View Article
- Google Scholar
40. Hair JF Jr., Hult GTM, Ringle CM, Sarstedt M, Danks NP, Ray S. Partial least squares structural equation modeling (PLS-SEM) using R: A workbook. Cham Switzerland Springer; 2021. https://link.springer.com/book/10.1007/978-3-030-80519-7
41. Tavakol M, Dennick R. Making sense of Cronbach’s alpha. Int J Med Educ. 2011;2:53–5. pmid:28029643
- View Article
- PubMed/NCBI
- Google Scholar
42. Diedenhofen B, Musch J. cocron: A web interface and R package for the statistical comparison of Cronbach’s alpha coefficients. Int J Internet Sci. 2016;11(1):51–60.
- View Article
- Google Scholar
43. Dimock M. Defining generations: where millennials end and generation Z begins. Pew Research Center. 2019 Jan 17 [cited 2024 Oct 15. ]. https://www.pewresearch.org/short-reads/2019/01/17/where-millennials-end-and-generation-z-begins/
44. Hauser DJ, Schwarz N. Attentive Turkers: MTurk participants perform better on online attention checks than do subject pool participants. Behav Res Methods. 2016;48(1):400–7. pmid:25761395
- View Article
- PubMed/NCBI
- Google Scholar
45. Muldoon J, Apostolidis P. ‘Neither work nor leisure’: motivations of microworkers in the United Kingdom on three digital platforms. New Media & Society. 2023;27(2):747–69.
- View Article
- Google Scholar
46. Weingart P, Joubert M, Connoway K. Public engagement with science – origins, motives and impact in academic literature and science policy. PLoS One. 2021;16(7):e0254201. pmid:34234382
- View Article
- PubMed/NCBI
- Google Scholar
47. Scheufele DA, Krause NM, Freiling I, Brossard D. What we know about effective public engagement on CRISPR and beyond. Proc Natl Acad Sci U S A. 2021;118(22):e2004835117. pmid:34050014
- View Article
- PubMed/NCBI
- Google Scholar

[ref1] 1. Chandler J, Rosenzweig C, Moss AJ, Robinson J, Litman L. Online panels in social science research: expanding sampling methods beyond mechanical turk. Behav Res Methods. 2019;51(5):2022–38. pmid:31512174
View Article
PubMed/NCBI
Google Scholar

[2] View Article

[3] PubMed/NCBI

[4] Google Scholar

[ref2] 2. Hauser DJ, Paolacci G, Chandler J. Common concerns with MTurk as a participant pool: evidence and solutions. In: Kardes FR, Herr PM, Schwarz N, editors. Handbook of research methods in consumer psychology. New York: Routledge; 2019, 319–37.

[ref3] 3. Newman A, Bavik YL, Mount M, Shao B. Data collection via online platforms: challenges and recommendations for future research. Appl Psychol. 2021;70(3):1380–402.
View Article
Google Scholar

[7] View Article

[8] Google Scholar

[ref4] 4. Berinsky AJ, Frydman A, Margolis MF, Sances MW, Valerio DC. Measuring attentiveness in self-administered surveys. Public Opin Quart. 2024;88(1):214–41.
View Article
Google Scholar

[10] View Article

[11] Google Scholar

[ref5] 5. Aguinis H, Villamor I, Ramani RS. MTurk research: review and recommendations. J Manag. 2020;47(4):823–37.
View Article
Google Scholar

[13] View Article

[14] Google Scholar

[ref6] 6. Evans JR, Mathur A. The value of online surveys: a look back and a look ahead. INTR. 2018;28(4):854–87.
View Article
Google Scholar

[16] View Article

[17] Google Scholar

[ref7] 7. Turk T, Elhady MT, Rashed S, Abdelkhalek M, Nasef SA, Khallaf AM, et al. Quality of reporting web-based and non-web-based survey studies: what authors, reviewers and consumers should consider. PLoS One. 2018;13(6):e0194239. pmid:29912881
View Article
PubMed/NCBI
Google Scholar

[19] View Article

[20] PubMed/NCBI

[21] Google Scholar

[ref8] 8. Peer E, Brandimarte L, Samat S, Acquisti A. Beyond the Turk: alternative platforms for crowdsourcing behavioral research. J Exp Soc Psychol. 2017;70153–63.
View Article
Google Scholar

[23] View Article

[24] Google Scholar

[ref9] 9. Anson IG. Taking the time? Explaining effortful participation among low-cost online survey participants. Res Politics. 2018;5(3).
View Article
Google Scholar

[26] View Article

[27] Google Scholar

[ref10] 10. Buhrmester M, Kwang T, Gosling SD. Amazon’s mechanical turk: a new source of inexpensive, yet high-quality, data? Perspect Psychol Sci. 2011;6(1):3–5. pmid:26162106
View Article
PubMed/NCBI
Google Scholar

[29] View Article

[30] PubMed/NCBI

[31] Google Scholar

[ref11] 11. Chandler J, Paolacci G, Hauser DJ. Data quality issues on mechanical turk. In: Litman L, Robinson J, editors. Conducting online research on Amazon mechanical Turk and beyond. Thousand Oaks: Sage Academic Publishing; 2020, 95–120. https://doi.org/10.4135/9781506391151

[ref12] 12. Hauser DJ, Moss AJ, Rosenzweig C, Jaffe SN, Robinson J, Litman L. Evaluating CloudResearch’s Approved Group as a solution for problematic data quality on MTurk. Behav Res Methods. 2023;55(8):3953–64. pmid:36326997
View Article
PubMed/NCBI
Google Scholar

[34] View Article

[35] PubMed/NCBI

[36] Google Scholar

[ref13] 13. Chmielewski M, Kucker SC. An MTurk crisis? Shifts in data quality and the impact on study results. Soc Psychol Personal Sci. 2019;11(4):464–73.
View Article
Google Scholar

[38] View Article

[39] Google Scholar

[ref14] 14. Douglas BD, Ewell PJ, Brauer M. Data quality in online human-subjects research: comparisons between MTurk, Prolific, CloudResearch, Qualtrics, and SONA. PLoS One. 2023;18(3):e0279720. pmid:36917576
View Article
PubMed/NCBI
Google Scholar

[41] View Article

[42] PubMed/NCBI

[43] Google Scholar

[ref15] 15. Ward MK, Meade AW. Dealing with careless responding in survey data: prevention, identification, and recommended best practices. Annu Rev Psychol. 2023;74:577–96. pmid:35973734
View Article
PubMed/NCBI
Google Scholar

[45] View Article

[46] PubMed/NCBI

[47] Google Scholar

[ref16] 16. Meade AW, Craig SB. Identifying careless responses in survey data. Psychol Methods. 2012;17(3):437–55. pmid:22506584
View Article
PubMed/NCBI
Google Scholar

[49] View Article

[50] PubMed/NCBI

[51] Google Scholar

[ref17] 17. Wood D, Harms PD, Lowman GH, DeSimone JA. Response speed and response consistency as mutually validating indicators of data quality in online samples. Soc Psychol Personal Sci. 2017;8(4):454–64.
View Article
Google Scholar

[53] View Article

[54] Google Scholar

[ref18] 18. Griffin M, Martino RJ, LoSchiavo C, Comer-Carruthers C, Krause KD, Stults CB, et al. Ensuring survey research data integrity in the era of internet bots. Qual Quant. 2022;56(4):2841–52. pmid:34629553
View Article
PubMed/NCBI
Google Scholar

[56] View Article

[57] PubMed/NCBI

[58] Google Scholar

[ref19] 19. Abbey JD, Meloy MG. Attention by design: using attention checks to detect inattentive respondents and improve data quality. J Oper Manag. 2017;53–56(1):63–70.
View Article
Google Scholar

[60] View Article

[61] Google Scholar

[ref20] 20. Oppenheimer DM, Meyvis T, Davidenko N. Instructional manipulation checks: detecting satisficing to increase statistical power. J Exp Soc Psychol. 2009;45(4):867–72.
View Article
Google Scholar

[63] View Article

[64] Google Scholar

[ref21] 21. Gummer T, Roßmann J, Silber H. Using instructed response items as attention checks in web surveys: properties and implementation. Sociol Methods Res. 2021;50(1):238–64.
View Article
Google Scholar

[66] View Article

[67] Google Scholar

[ref22] 22. Hauser DJ, Ellsworth PC, Gonzalez R. Are Manipulation checks necessary? Front Psychol. 2018;9:998. pmid:29977213
View Article
PubMed/NCBI
Google Scholar

[69] View Article

[70] PubMed/NCBI

[71] Google Scholar

[ref23] 23. Fowler C, Jiao J, Pitts M. Frustration and ennui among Amazon MTurk workers. Behav Res Methods. 2023;55(6):3009–25. pmid:36018485
View Article
PubMed/NCBI
Google Scholar

[73] View Article

[74] PubMed/NCBI

[75] Google Scholar

[ref24] 24. Tashakkori A, Teddlie C. Integrating qualitative and quantitative approaches to research. In: Bickman L, Rog DJ, editors. The SAGE Handbook of Applied Social Research Methods. Thousand Oaks: Sage Publications, Inc.; 2009, 283–317.

[ref25] 25. Kunz T, Quoß F, Gummer T. Using placeholder text in narrative open-ended questions in web surveys. J Surv Stat Methodol. 2021;9(5):992–1012.
View Article
Google Scholar

[78] View Article

[79] Google Scholar

[ref26] 26. Meitinger K, Behr D, Braun M. Using apples and oranges to judge quality? Selection of appropriate cross-national indicators of response quality in open-ended questions. Soc Sci Comput Rev. 2019;39(3):434–55.
View Article
Google Scholar

[81] View Article

[82] Google Scholar

[ref27] 27. Koralesky KE, Sirovica LV, Hendricks J, Mills KE, von Keyserlingk MAG, Weary DM. Social acceptance of genetic engineering technology. PLoS One. 2023;18(8):e0290070. pmid:37585415
View Article
PubMed/NCBI
Google Scholar

[84] View Article

[85] PubMed/NCBI

[86] Google Scholar

[ref28] 28. Kuo C, Koralesky KE, von Keyserlingk MAG, Weary DM. Gene editing in animals: What does the public want to know and what information do stakeholder organizations provide? Public Underst Sci. 2024;33(6):725–39. pmid:38326984
View Article
PubMed/NCBI
Google Scholar

[88] View Article

[89] PubMed/NCBI

[90] Google Scholar

[ref29] 29. Clifford S, Jerit J. Cheating on political knowledge questions in online surveys. Public Opin Q. 2016;80(4):858–87.
View Article
Google Scholar

[92] View Article

[93] Google Scholar

[ref30] 30. Motta MP, Callaghan TH, Smith B. Looking for answers: identifying search behavior and improving knowledge-based data quality in online surveys. Int J Public Opin Res. 2017; 29(4):575–603.
View Article
Google Scholar

[95] View Article

[96] Google Scholar

[ref31] 31. Smith B, Clifford S, Jerit J. TRENDS: How internet search undermines the validity of political knowledge measures. Polit Res Q. 2019;73(1):141–55.
View Article
Google Scholar

[98] View Article

[99] Google Scholar

[ref32] 32. Permut S, Fisher M, Oppenheimer DM. TaskMaster: a tool for determining when subjects are on task. Adv Methods Pract Psychol Sci. 2019;2(2):188–96.
View Article
Google Scholar

[101] View Article

[102] Google Scholar

[ref33] 33. Turnitin™ [Internet]. Turnitin. n.d. [cited 2024 Oct 15. ]. Available from: https://turnitin.com/
View Article
Google Scholar

[104] View Article

[105] Google Scholar

[ref34] 34. United States Census Bureau. American Community Survey: Age and Sex. 2019 [cited 2024 Oct 15. ]. Available from: https://data.census.gov/all?t=Age%20and%20Sex
View Article
Google Scholar

[107] View Article

[108] Google Scholar

[ref35] 35. Siegrist M. The influence of trust and perceptions of risks and benefits on the acceptance of gene technology. Risk Anal. 2000;20(2):195–203. pmid:10859780
View Article
PubMed/NCBI
Google Scholar

[110] View Article

[111] PubMed/NCBI

[112] Google Scholar

[ref36] 36. Bronfman NC, Jiménez RB, Arévalo PC, Cifuentes LA. Understanding social acceptance of electricity generation sources. Energy Policy. 2012;46246–52.
View Article
Google Scholar

[114] View Article

[115] Google Scholar

[ref37] 37. Chen M-F, Li H-L. The consumer’s attitude toward genetically modified foods in Taiwan. Food Qual Prefer. 2007;18(4):662–74.
View Article
Google Scholar

[117] View Article

[118] Google Scholar

[ref38] 38. Raimi KT, Wolske KS, Hart PS, Campbell-Arvai V. The aversion to tampering with Nature (ATN) scale: individual differences in (Dis)comfort with altering the natural world. Risk Anal. 2020;40(3):638–56. pmid:31613025
View Article
PubMed/NCBI
Google Scholar

[120] View Article

[121] PubMed/NCBI

[122] Google Scholar

[ref39] 39. Bredahl L. Determinants of consumer attitudes and purchase intentions with regard to genetically modified food – results of a cross-national survey. J Consum Policy. 2001;24(1):23–61.
View Article
Google Scholar

[124] View Article

[125] Google Scholar

[ref40] 40. Hair JF Jr., Hult GTM, Ringle CM, Sarstedt M, Danks NP, Ray S. Partial least squares structural equation modeling (PLS-SEM) using R: A workbook. Cham Switzerland Springer; 2021. https://link.springer.com/book/10.1007/978-3-030-80519-7

[ref41] 41. Tavakol M, Dennick R. Making sense of Cronbach’s alpha. Int J Med Educ. 2011;2:53–5. pmid:28029643
View Article
PubMed/NCBI
Google Scholar

[128] View Article

[129] PubMed/NCBI

[130] Google Scholar

[ref42] 42. Diedenhofen B, Musch J. cocron: A web interface and R package for the statistical comparison of Cronbach’s alpha coefficients. Int J Internet Sci. 2016;11(1):51–60.
View Article
Google Scholar

[132] View Article

[133] Google Scholar

[ref43] 43. Dimock M. Defining generations: where millennials end and generation Z begins. Pew Research Center. 2019 Jan 17 [cited 2024 Oct 15. ]. https://www.pewresearch.org/short-reads/2019/01/17/where-millennials-end-and-generation-z-begins/

[ref44] 44. Hauser DJ, Schwarz N. Attentive Turkers: MTurk participants perform better on online attention checks than do subject pool participants. Behav Res Methods. 2016;48(1):400–7. pmid:25761395
View Article
PubMed/NCBI
Google Scholar

[136] View Article

[137] PubMed/NCBI

[138] Google Scholar

[ref45] 45. Muldoon J, Apostolidis P. ‘Neither work nor leisure’: motivations of microworkers in the United Kingdom on three digital platforms. New Media & Society. 2023;27(2):747–69.
View Article
Google Scholar

[140] View Article

[141] Google Scholar

[ref46] 46. Weingart P, Joubert M, Connoway K. Public engagement with science – origins, motives and impact in academic literature and science policy. PLoS One. 2021;16(7):e0254201. pmid:34234382
View Article
PubMed/NCBI
Google Scholar

[143] View Article

[144] PubMed/NCBI

[145] Google Scholar

[ref47] 47. Scheufele DA, Krause NM, Freiling I, Brossard D. What we know about effective public engagement on CRISPR and beyond. Proc Natl Acad Sci U S A. 2021;118(22):e2004835117. pmid:34050014
View Article
PubMed/NCBI
Google Scholar

[147] View Article

[148] PubMed/NCBI

[149] Google Scholar

Figures

Abstract

Introduction

Method

Ethics

Participant recruitment

Survey design and measures

Data preparation and analysis

Results

Discussion

Conclusion

Acknowledgments

References