Quantifying participant distress: Validity and applicability of a distress measure to evaluate harm in quantitative assessments

Jess MacArthur; Ratan Budhathoki; Min Prasad Basnet; Ambika Yadav; Sabitra Dhakal; Juliet Willetts

doi:10.1371/journal.pone.0326957

Abstract

Structured interviews remain a key approach to collect information from community members, particularly in development contexts. Such enumerated surveys often focus on potentially distressing topics including gender equality, social inclusion, wellbeing, and even socio-demographics. Researchers have an obligation to consider the ethics of survey processes and mitigate potential distress for participants. However, approaches to quantify and evaluate participant distress remain nascent outside of clinical practice. To support ethical considerations in quantitative survey deployment, we introduce a four-item formative measure to analyze interview ease, stress, privacy, and comprehension. We present the measure's conceptual and empirical development and examine the validity of the measure through data from Cambodia and Nepal (n = 4,674) using Partial Least Squares Structural Equation Modeling (PLS-SEM) for formative measurement model assessment. The measure is shown to have content and face validity, anticipated divergence with two reflective constructs, low collinearity, structural validity, and construct validity through known groups testing. As ethical considerations are increasingly recognized as important in research in both development and other research and evaluation contexts, tools to diagnose and analyze distress can support in mitigating negative impacts.

Citation: MacArthur J, Budhathoki R, Prasad Basnet M, Yadav A, Dhakal S, Willetts J (2025) Quantifying participant distress: Validity and applicability of a distress measure to evaluate harm in quantitative assessments. PLoS One 20(7): e0326957. https://doi.org/10.1371/journal.pone.0326957

Editor: Guilherme Tavares de Arruda, UFSCar: Universidade Federal de Sao Carlos, BRAZIL

Received: October 15, 2024; Accepted: June 7, 2025; Published: July 2, 2025

Copyright: © 2025 MacArthur et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the paper and its Supporting Information files.

Funding: This work was supported by the Australian Government’s Water for Women Fund [Grant number WRA-034 - JW]. The funders did not play any role in the study.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Ethical considerations in survey design continue to be explored and scrutinized in evaluation and research. Key considerations include privacy and confidentiality, utilizing clear questions, ensuring informed consent, and avoiding persuasion or pressure [1]. Ultimately, these considerations aim to uphold ethical principles and mitigate potential distress [1,2].

Within international development activities, enumerated surveys remain a popular way of collecting data at scale [3,4] and focus on answering research questions such as “how many” and “to what extent”. They are primarily conducted using structured interviews in which enumerators mark down responses either on paper on using digital tools and surveys can take up to several hours to conduct. Questions are often multiple-choice which allows the survey to produce quantitative results by converting the multiple-choice responses to numeric values. Sample sizes are often larger than qualitative studies to ensure statistical power and generalizability of results [5].

Ethical considerations are even more important in surveys which focus on potentially sensitive topics including health, equality, inclusion, gender, wellbeing and socio-demographics. Discussing these sensitive topics can lead to reactions including acute stress, anxiety, depression, or embarrassment [6]. In tangible terms participants can become emotional, become stressed, experience a change in mood, feel uncomfortable, and be unable to finish the survey – these responses are called participant “distress” [1,2].

In the United States and Australia, distress assessments have been used in psychological research related to traumatic experiences [7–9], research coercion within vulnerable populations [9], research related to post-traumatic stress syndrome [10], and clinical studies with adults and children [11–13]. These existing tools (summarized in S1 Table) range from 1 to 34 items with varying levels of measure validity testing. However, these existing tools: 1) have been designed for use in hospital settings for psychiatric, psychological and clinical research contexts, with less applicability in sociological studies; 2) rely on self-facilitated questionnaires in hospital settings, rather than in enumerator household-based surveys in rural communities; and 3) have not been designed for or tested in international development contexts, including south and southeast Asia.

As such, this paper presents the validation of a rapidly designed distress measure to track and evaluate distress in enumerated quantitative surveys within sociological research in international development contexts. The article continues with a brief description of the methods used to develop and validate the distress measure as part of three survey deployments in Cambodia and Nepal (n = 4,674). The results of the validation analysis follow. We then discuss three potential use cases of the measure and reflect on its limitations. We conclude by describing opportunities to continue refining and using the measure in enumerated surveys.

Methods

The design and validation of the distress measure emerged in the context of the Water, Sanitation and Hygiene – Gender Equality Measure (WASH-GEM) which measures changing gender norms, dynamics, and structures within WASH programming [14–16]. The development of the full WASH-GEM tool followed robust instrument design processes and validation procedures for social measures through rapid, exploratory and validation pilots [16,17]. As the WASH-GEM focuses on sensitive topics, it requires a high ethical standard and robust tracking of potential distress. Validating the tool’s distress items was a secondary outcome of the WASH-GEM’s development.

The catalyst for including and testing distress-related items came during the rapid pilot phase of the WASH-GEM’s development [16]. During the WASH-GEM’s cognitive interviewing process in Nepal, the researchers noted that participants – both men and women – had emotional responses to survey items not expected to necessarily cause distress, including questions related to self-efficacy, equality awareness, control over resources, and physical wellbeing. During a daily debrief, the team decided it would be important to systematically track participant distress for all survey respondents with the WASH-GEM going forward and a module was designed overnight for the next day’s pilot deployment.

As the distress measure was developed for immediate deployment to track distress during the pilot activity, it necessitated an abridged design process. Additionally, as the measure was added to an existing survey instrument, it was not feasible to test a long list of items, rather to identify a short list of items for testing.

The designed distress measure is formative rather than reflective as each of the four items represents one aspect of distress and is documented by enumerator observations rather than respondent responses or self-assessment, in alignment with best practice checklists [18,19]. As such, the traditional scale (reflective measure) validation tools are not relevant to this process, instead leading to validation procedures for formative measures [20,21].

Item identification and refinement

The four items for testing were identified during a team debrief workshop in the rapid pilot phase of the WASH-GEM. The items were developed to align with the WASH-GEM’s distress protocol built on academic best practices for research on sensitive topics [22]. Through brainstorming discussions and review of the study’s distress protocol the team clarified that participant distress was caused by four key factors: the smoothness of the interview process (ease), the stress of the participant often shown through emotional reactions (stress), the privacy of the interview (privacy), and the comprehension of the participant related to the complex topics covered in the rest of the survey (comprehension). The six team members who participated in this brainstorming process included a mixed gender group of expert academics and practitioners with expertise in monitoring, evaluation, do-no-harm approaches, gender equality and social inclusion, and instrument design. The structure of the items was then drafted to mirror existing items within the wider WASH-GEM tool which was in the process of cognitive testing.

Each of the four items aimed to engage with a different domain of do-no-harm often explored within cognitive interviewing and distress protocol development. Ease explores the extent to which the interview questions flow and is related to the rapport between the respondent and interviewer [23]. Stress reflects the experiences of adverse emotional reactions that might arise during the interview [22]. Privacy relates to the experience of participants in being able to share their answers freely and openly without the influence of others – including family members [24,25]. Lastly, comprehension reflects the importance of interviewees understanding the questions and quantifies the requirement of enumerators to explain items [23].

The tested items are summarized in Table 1 with the notes provided to enumerators to clarify response options. The team additionally included a text box for any further open-response reflections from the enumerators. Responses from this item (ob5) are not included in this analysis and are a topic for future exploration. Further details on the questionnaire and WASH-GEM can be found on the WASH-GEM Learning Website (https://waterforwomen.uts.edu.au/wash-gem/).

Download:

Table 1. Distress measure item summary.

https://doi.org/10.1371/journal.pone.0326957.t001

Content and face validity

Between the rapid and validation pilots of the wider WASH-GEM tool, the distress measure’s items were reviewed in the context of relevant literature, reviewed by experts and discussed with enumerators. After the rapid pilot, the team conducted a literature review to ensure alignment with best practice in social research which led to the refinement of the items. The items were also reviewed by a range of global and local experts (n = 22) on survey design, do-no-harm, inclusion and ethical research processes. Additionally, the team conducted three workshop discussions with enumerators (n = 40) who had enumerated many surveys with respondents in their respective contexts. These three steps helped to refine the items, but did not led to any significant changes or additions to the initial four items, only adjustments to the wording of the response options.

Validation datasets and sampling

The distress measure was validated using three cross-sectional data sets (n = 4,674) from the WASH-GEM in Cambodia and Nepal. The first two datasets were from a baseline project assessment in two Nepali districts (Sarlahi and Dailekh) and three Cambodian provinces (Kampong Thom, Prey Veng, and Kandal). The last dataset was from a midline assessment in Nepal also from Sarlahi and Dailekh. In each dataset, the survey tool aimed to interview a dyad of the male primary decision-maker (amongst men) and the female primary decision-maker (amongst women), as such the datasets included a near equal number of male and female respondents. The survey was conducted in Nepali and Khmer in Nepal and Cambodia respectively, through cross-checked translations of the tool. In-depth enumerator training was conducted in accordance with the WASH-GEM ethical guidelines including distress protocols and agreement on specific translations with regards to local dialects. The deidentified data can be found in S1 Data.

All three datasets relied on stratified multistage random sample design, ensuring a balanced sample of women and men respondents. Focused on the five target districts/provinces, the team then leveraged the purposeful selection of sub-districts/municipalities to represent the breadth of variance in program catchment areas, using Demographic and Health Survey data to identify variations in WASH status, electricity, land ownership, poverty levels, and female-headed households. Next, the team randomly selected communes/wards from lists of program working areas; lastly, the team randomly selected villages. In each commune/ward, two villages were selected: one primary and one alternate if achieving just over 80 respondents per village was not feasible. At the village level, enumerators sought socio-economic variation in their selection of households. In Nepal this process was done twice, with villages randomly selected for the two different phases of data collection; however the selected wards remained the same.

Data was pooled from the three datasets in Cambodia and Nepal and missing data was replaced using imputed means by gender and region. As only a small number of respondents identified their gender as other, a subset of male-female data was used for gender comparisons to avoid drawing unsubstantiated conclusions.

Measure validation

To validate the Distress-4 measure for more widescale use, we utilized Partial Least Squares Structural Equation Modeling (PLS-SEM) and bootstrapping (5,000 iterations). Procedures broadly aligned with work by Hair et al. [26]. In this approach, a formative measure is validated by creating a structural equation model with other strategically selected variables from the WASH-GEM, allowing the model to be identified. Literature describes three ways to include additional variables to identify the model and validate a formative measure: 1) one or two reflective indicators that summarize the same concept as the formative measure; 2) two reflective constructs theoretically related to the formative measure; or 3) a combination of one global indicator and one construct [20,26]. Our datasets did not include a global indicator or alternative reflective indicators of distress, but the datasets did include a variety of other constructs within the distress measure’s nomological network; as such we have selected option two for model identification. PLS-SEM was selected as a modeling approach, as it is appropriate for a mixture of formative and reflective measures and can utilize non-normally distributed data [26]. Data analysis was conducted in RStudio including the use of the SEMinR package [26,27].

Reflective Variable Identification. The model was identified using two theoretically and statistically related reflective constructs [20]. The wider WASH-GEM tool comprises 17 validated measures including a range of both formative and reflective constructs [16]. For six of the WASH-GEM’s most valid reflective constructs, there is a theoretical causal relationship with distress; lower scores in these themes theoretically lead to increased distress (Household Influence, Household Autonomy, Self-efficacy, Collective Action, Equality Awareness and Mobility). This pattern was also observed during the WASH-GEM’s rapid pilot, in that individuals with lower scores in these measures were observed to have increased distress. To select two reflective constructs to use for model identification, intra-measure Pearson correlation coefficients were calculated and plotted on a correlation matrix for the six relevant measures and a simple sum version of the distress measure. From these six measures, two were selected for the model. In a slight variation from existing practice, instead of using the reflective constructs as outcome variables, the directionality was reversed to better align with theoretical and observed insights. The results of this reversed model were cross-checked against the initial model to ensure this choice did not impact the validation results.

Divergent Validity. The two reflective constructs used to identify the PLS-SEM model were expected to have weak-to-moderate divergence from distress measure scores. As part of the model, path coefficients were calculated for each of the two connections, estimating that there would be weak-to-moderate negative path coefficients (β ≤ −0.20, p ≤ 0.01) between the reflective constructs and the formative construct [27].

Item Collinearity. For the measure to effectively operate formatively, there should not be a high level of collinearity between the items. Item collinearity was first assessed through intra-item correlations and plotted on a correlation matrix. Item collinearity was next assessed as part of the PLS-SEM model through variance inflation factors (VIF). A range of 0.2 to 0.6 was adopted as the thresholds for the intra-item correlations, and a threshold of 3 was adopted for the VIF values [26] to indicate low collinearity – required for a formative measure.

Item Significance and Relevance. Item significance and relevance were assessed as part of the PLS-SEM model for each of the distress measure’s four items. The statistical significance of each item’s weight was assessed (p ≤ 0.05) alongside loadings, with loadings ≥ 0.50 justifying the inclusion of the item [26]. For items with statistically insignificant weights and loadings ≤ 0.50, inclusion could be justified through statistical significance of the loading and theoretical relevance [26].

Structural Model. Although the main output of the analysis was to assess the validity of the distress measure, the model also tested the influence of the two reflective constructs (Equality Awareness and Self-efficacy) on respondent distress for the case study dataset. From theory and observation, it was anticipated that the overall model would have a weak coefficient of determination (0.20 < r²< 0.30), for the study dataset. In alignment with PLS-SEM best practices, model fit statistics are not required, instead relying on the path coefficients and the coefficient of determination [26].

Scoring. A comparison was done to identify which scoring approach would be most appropriate – a simple sum or weighted sum. While weighted sums from regressions are more accurate, there is a strong benefit to project teams in being able to quickly score responses, without the use of statistical analysis software. As such, we assessed the correlation between the weighted sum through the PLS-SEM model and a simple sum scoring. A Pearson correlation coefficient greater than 0.90 with p ≤ 0.01 was deemed to provide sufficient rationale for a simple sum score. This was also tested for each of the three datasets by gender.

Construct Validity. Construct validity explored the extent to which the measure demonstrated the expected theoretical empirical relations, using the simple sum version of the distress measure. Using known groups t-testing, we tested the hypothesis that poorer, older, and less educated people would have higher levels of distress than their wealthier, younger and higher educated counterparts. These groups were identified through observations of the types of people who experienced distress during the rapid pilot and discussions with the program managers and enumerators.

While not included in this article due to ethical constraints, the same validation procedures were also conducted on a larger sample (total n = 6,025) with additional responses from Bhutan, Laos, and Ghana. The same overall validation results were found in this larger sample.

Ethical considerations

The study was approved in two phases by the University of Technology Sydney (UTS HREC REF NO. ETH18–2599 – Project 17232 and 21051). The first phase covered datasets 1 and 2, and the second phase covered dataset 3. The studies aligned with the WASH-GEM do-no-harm processes and distress protocols. Informed consent was obtained verbally by each participant prior to the start of each interview, as indicated in the approved consent procedures and recorded digitally through a checkbox checked by the enumerator. Additional information regarding the ethical, cultural, and scientific considerations specific to inclusivity in global research is included in the Supporting Information (S1 Checklist).

Results

Sociodemographic characteristics

Sociodemographic characteristics of the three datasets are found in Table 2. The participants came from five different provinces/districts of Cambodia and Nepal and were a near even split between male and female respondents. Overall, nearly half of the responses had preschool or less level education and were between 31 and 49 years of age. Wealth quintiles were calculated using principal component analyses within each country and dataset using a series of asset-based questions. There was minimal missing data for the socio-demographic responses.

Download:

Table 2. Sociodemographic characteristics of the study participants in Cambodia and Nepal.

https://doi.org/10.1371/journal.pone.0326957.t002

Distress measure items

The four identified items for the distress measure relate to interview ease (ob1), stress (ob2), privacy (ob3), and comprehension (ob4) as introduced in Table 3. Overall, the majority of enumerators identified that interviews were easy or very easy to conduct, that the participants did not display physical signs of stress, that the interviews were private and that comprehension was quite high. However, all four items did have responses that indicated distress of participants.

Download:

Table 3. Distress measure items across the datasets.

https://doi.org/10.1371/journal.pone.0326957.t003

Intra-item and measure collinearity

Intra-measure and intra-item correlation coefficients were calculated and plotted on a correlation matrix (Fig 1), with two objectives: 1) to identify potentially appropriate WASH-GEM measures to use within the PLS-SEM model and 2) to explore intra-item correlations of the four distress measure items. Of the six reflective WASH-GEM measures tested, four measures had the required divergence from Distress with correlation coefficients less than −0.15, p < 0.05 (Household Influence, Self-efficacy, Collective Action and Equality Awareness) indicating potential appropriateness for use in the PLS-SEM model. Further testing within the PLS-SEM model and review of notes from the observation data from the rapid pilot narrowed this down to two appropriate measures for identification (Self-efficacy with five items and Equality Awareness with seven items). Additionally, the intra-item correlation coefficients for the four distress measure items ranged from 0.28 to 0.58 (p < 0.03), falling within the anticipated thresholds and indicating low collinearity. A full correlation matrix of all 17 WASH-GEM themes can be found in S1 Fig.

Download:

Fig 1. Intra measure and item correlation matrix indicating correlation coefficients.

Explored WASH-GEM Measures: Household Influence, Household Autonomy, Self-efficacy, Collective Action, Equality Awareness and Mobility. Matrix also includes a simple sum score of the distress measure and the four distress measure items (ease, stress, privacy and comprehension).

https://doi.org/10.1371/journal.pone.0326957.g001

PLS-SEM model and distress measure validity

The PLS-SEM model was identified using the two selected reflective measures as illustrated in Fig 2. For both the reflective Equality Awareness and Self-efficacy measures all items were statistically significant. All but one item had sufficiently large loadings (ea7 loading of 0.191). However, as the validation of these two reflective measures was not the purpose of this analysis and they have been validated elsewhere previously [16], the item was not dropped. Equality Awareness and Self-efficacy had Cronbach’s alpha coefficients of 0.90 and 0.88 respectively, indicating that these measures are internally consistent and applicable for identifying the model used to test the reliability of the Distress-4 measure.

Download:

Fig 2. PLS-SEM model illustration indicating measure weights/loadings, path coefficients, and the model’s coefficient of determination.

https://doi.org/10.1371/journal.pone.0326957.g002

For the formative Distress-4 measure, all items had VIF scores under the threshold of 3 (ranging from 1.180 to 1.771), indicating low collinearity as also illustrated in the intra-item correlations. Three of the four distress items had statistically significant weighting as indicated in Fig 2 and the same three items had loadings greater than 0.5 (0.79 to 0.85 p < 0.01). The exception was ob3_privacy, which had a loading of 0.40 (p < 0.01). These results recommend reflection on the importance of the item for the measure’s conceptualization, however as the privacy item is important for our conceptualization of distress, we have chosen to keep the item within the measure. These same broad results also held for a non-bootstrapped model.

The path coefficients between Equality Awareness and Distress (β = −0.198, p < 0.001) and Self-efficacy and Distress (β = −0.396, p < 0.001), both performed as anticipated indicating weak negative influence from the two reflective constructs and distress. As anticipated, the overall model indicates a weak influence of the reflective measures on distress for our dataset (r²= 0.203). Using these two divergent constructs enabled us to identify a model and test the validity of the formative construct; together these results indicate that the distress measure is valid for use as a formative construct. Additionally, a reversed PLS-SEM model was developed (S2 Fig) with Equality Awareness and Self-efficacy as outcome variables (β ranged −0.405 to −0.217, p < 0.001), with the same broad results for the Distress-4 construct.

Scoring

The most appropriate scoring approach was identified by correlating two versions of the Distress-4 construct scores: 1) calculated through PLS-SEM (regressions weighing) and 2) using a simple sum scoring approach. The scores correlated at r = 0.96, p > 0.001 (r = 0.93; 0.90–0.99 when disaggregating by gender and year/country), indicating that a simple score approach is justifiable. As this is much easier for project teams to calculate, we continue the analysis with the simple score approach.

Construct validity through known groups

Construct validity was explored both at the measure (Fig 3) and item (Fig 4) levels through known groups analysis through the pooled data; the groups were chosen based on the observations from the rapid pilot. Notably at the measure level, older (p < 0.001), poorer (p < 0.001), and less educated (p < 0.01), people all had statistically significant lower scores (more distress) than their younger, educated, and wealthier counterparts as illustrated in the violin plots of Fig 3. However, there was not a statistical difference between genders – an anticipated response based on other WASH-GEM studies and the rapid pilot observations.

Download:

Fig 3. Violin plots of normalized distress measure scores by gender, age, education level and wealth quintile indicating statistical significance.

https://doi.org/10.1371/journal.pone.0326957.g003

Download:

Fig 4. Violin plots of normalized item scores by gender, age, education level and wealth quintile indicating statistical significance.

https://doi.org/10.1371/journal.pone.0326957.g004

At the item level, there were statistically significant differences across all known-groups (gender, age, education, and wealth) for ease, and across some know-groups for stress (age, wealth), privacy (gender, education, wealth), and comprehension (gender, age, wealth). Notably, there were statistically significant differences for all four items across wealth quintiles.

Discussion

Summary and interpretation

This study aimed to examine the validity of a rapidly developed distress measure for use in enumerator structured interview surveys – a key tenent of international development programming and research. The measure was developed in the context of fieldwork and implementation of a wider study. A PLS-SEM model was identified using two reflective constructs to explore the Distress-4 measure’s validity. The four items were shown to form a valid measure as assessed through formative measure validation procedures [26,28]: content and face validity; divergence validity; assessment of collinearity; structural validity of the measure’s internal model; and construct validity through known groups testing. Overall, the measure provides a validated snapshot of four important do-no-harm assumptions related to ease, stress, privacy, and comprehension.

Additionally, the PLS-SEM model illustrates the weak but significant influence of Equality Awareness and Self-efficacy on Distress within the case study dataset. This highlights that within our dataset lower gender equality scores led to increased distress. We identify three potential reasons for these connections. As the survey explored aspects of equality in detail through multiple questions, it is possible that this repetition of potentially uncomfortable gender equality focused questions is associated with increased distress. It is also possible, that respondents who showed lower equality awareness picked up on subtle cues from enumerators that their responses were not aligned with a ‘positive’ direction. Lastly, lower equality awareness participants were more likely to be confused with phrasing and words related to gender equality as they may have been less exposed to such language. As such, we speculate that ‘lower performance’ on any survey tool could also theoretically lead to increased distress.

The distress measure is, to the best of our knowledge, the only tool developed to measure distress in contexts outside of clinical practice and outside of the United States and Australia, with our testing taking place in two Asian countries. As many international development surveys focus on potentially sensitive topics, the sector has an obligation to track, mitigate and reflect on the potential harm of research. Some scholars even take this further to call for research which actively aims to ‘do-more-good’ in a transformative approach to research and evaluation [29]. We now reflect on three use cases for the distress measure.

During deployment

During a survey deployment the distress measure can be used to track distress by the enumeration team. Distress data can be collated on-the-spot and used as a part of the daily debriefing to identify cases of distress that may require further referral to supporting services. Such referral approaches are best practice in research on sensitive topics [22]. There are also opportunities during these debriefs to revise wording or re-order questions to mitigate stress and concerns. However, it is important to create a safe debriefing space to ensure that enumerators feel comfortable and empowered to accurately report distress.

After deployment

After a survey deployment, within an after-action-review [30], distress measure results can be used to analyze which types of individuals were more likely to experience distress. This type of analysis can explore intersectional aspects such as age, ethnicity, education level and poverty. At this level, analysts can also explore if certain enumeration teams were more likely to report distress. Lessons from the after-action-review should be shared with enumeration teams as well as research and evaluation team members to identify ways to reduce harm in future surveys.

Within reporting

Lastly, we argue that studies on sensitive topics should report distress results as part of their wider do-no-harm strategies for transparency and reflection. Within the positivist and post-positivist paradigms that dominant much research on sensitive topics [31] research is seen as a neutral activity, in which enumerators are there solely to collect information from communities. However, there is increasing evidence that the very act of asking questions can raise the critical consciousness of participants and lead to both positive and negative outcomes [32]. Nevertheless, many researchers on sensitive topics are not cognizant of the potential harm of their research and as such do not reflect on the potential negative (and positive) implications. A paradigm shift towards a transformative perspective of research and evaluation, would support stronger do-no-harm approaches and enable researchers to purposely pursue positive impacts. This is particularly important in the context of research conducted in international development contexts, but also relevant in other settings.

Limitations and future research

The validation and value of this measure must be understood alongside its limitations relating to its rapid development and novelty.

First, although the items were developed and interrogated with a group of expert practitioners and academics, there was not time or opportunity to identify a wide range of items and use other measure development procedures. As such, the four items tested in this article, were the only four tested. Nonetheless, the four items still pass all the tested validation procedures.

Second, as the measure was developed as an enumerator observation, rather than a question-based approach, there is a risk of under-reporting by enumerators. Care must be taken during enumeration training to accurately explain the purpose of distress tracking and a random sample of interviews should ideally be observed to encourage accurate reporting.

Third, as the measure was developed rapidly, there was not opportunity to date to test criterion validity by comparing other similar reflective measures which are directly asked to respondents such as the interview distress assessment [10] or the reactions to research measure [12]. Additionally, validity tools such as test-retest validity would not be appropriate for a measure seeking to understand the experiences in conducting a survey.

Future work should continue to test and refine this measure for use in wider contexts and across different types of surveys on sensitive topics.

Conclusions

Within this article, we have presented the development and validation of a distress measure for the international development sector and research and evaluation more broadly. The four-item construct measures participant distress in survey deployment related to sensitive topics. Research and evaluation remain a critical component of development interventions and as such it is important that these activities do not cause harm to participants. Ultimately, the Distress-4 provides a simple approach to track, mitigate and reflect on the potential harm of research, with an aim of creating research which does more good.

Supporting information

S1 Data. Deidentified Cleaned Dataset.

Compiled dataset for the three rounds of data collection with scored measures for WASH-GEM themes and selected items. Response options in English translation have been harmonized across three rounds of data collection.

https://doi.org/10.1371/journal.pone.0326957.s001

(CSV)

S1 Analysis Code. R-Studio Analysis Code.

Simplified code used for the distress measure validation analysis.

https://doi.org/10.1371/journal.pone.0326957.s002

(RMD)

S1 Fig. Correlation matrix of all 17 WASH-GEM themes and the four distress items.

https://doi.org/10.1371/journal.pone.0326957.s003

(TIF)

S2 Fig. PLS-SEM model with reflective measures as outcome variables.

https://doi.org/10.1371/journal.pone.0326957.s004

(TIF)

S1 Table. Similar distress measures.

A selection of existing measures to explore distress or similar aspects for research participants.

https://doi.org/10.1371/journal.pone.0326957.s005

(DOCX)

S1 Checklist. Inclusivity in global research checklist.

https://doi.org/10.1371/journal.pone.0326957.s006

(DOCX)

Acknowledgments

We are grateful for all the enumerators who participated in this study and helped to clarify the importance of tracking distress. We are also grateful to all participants for their insights. Additionally, thanks to the editor and reviewers of this manuscript whose feedback have helped shape the direction of our analysis.

References

1. Labott SM, Johnson TP, Fendrich M, Feeny NC. Emotional risks to respondents in survey research. J Empir Res Hum Res Ethics. 2013;8(4):53–66. pmid:24169422
- View Article
- PubMed/NCBI
- Google Scholar
2. Labott SM, Johnson TP, Feeny NC, Fendrich M. Evaluating and addressing emotional risks in survey research. Surv Pract. 2016;9(1):1–9.
- View Article
- Google Scholar
3. Development Initiatives. Household surveys factsheet. 2017. Available: http://devinit.org/wp-content/uploads/2017/07/Key-facts-on-household-surveys.pdf
- View Article
- Google Scholar
4. Kumar K. Conducting mini surveys in developing countries. 2006.
5. White H, Raitzer DA. Impact Evaluation of Development Interventions: A Practical Guide. 2017. Available: https://www.adb.org/sites/default/files/publication/392376/impact-evaluation-development-interventions-guide.pdf
- View Article
- Google Scholar
6. Jorm AF, Kelly CM, Morgan AJ. Participant distress in psychiatric research: a systematic review. Psychol Med. 2007;37(7):917–26. pmid:17224097
- View Article
- PubMed/NCBI
- Google Scholar
7. Parslow RA, Jorm AF, O’Toole BI, Marshall RP, Grayson DA. Distress experienced by participants during an epidemiological survey of posttraumatic stress disorder. J Trauma Stress. 2000;13(3):465–71. pmid:10948486
- View Article
- PubMed/NCBI
- Google Scholar
8. Walker EA, Newman E, Koss M, Bernstein D. Does the study of victimization revictimize the victims? Gen Hosp Psychiatry. 1997;19(6):403–10. pmid:9438184
- View Article
- PubMed/NCBI
- Google Scholar
9. Dugosh KL, Festinger DS, Croft JR, Marlowe DB. Measuring coercion to participate in research within a doubly vulnerable population: initial development of the coercion assessment scale. J Empir Res Hum Res Ethics. 2010;5(1):93–102. pmid:20235867
- View Article
- PubMed/NCBI
- Google Scholar
10. Griffin MG, Resick PA, Waldrop AE, Mechanic MB. Participation in trauma research: is there evidence of harm? J Trauma Stress. 2003;16(3):221–7. pmid:12816333
- View Article
- PubMed/NCBI
- Google Scholar
11. Kassam-Adams N, Newman E. The reactions to research participation questionnaires for children and for parents (RRPQ-C and RRPQ-P). Gen Hosp Psychiatry. 2002;24(5):336–42. pmid:12220800
- View Article
- PubMed/NCBI
- Google Scholar
12. Newman E, Willard T, Sinclair R, Kaloupek D. Empirically supported ethical research practice: the costs and benefits of research from the participants’ view. Account Res. 2001;8(4):309–29. pmid:12481796
- View Article
- PubMed/NCBI
- Google Scholar
13. Joffe S, Cook EF, Cleary PD, Clark JW, Weeks JC. Quality of informed consent: a new measure of understanding among research subjects. J Natl Cancer Inst. 2001;93(2):139–47. pmid:11208884
- View Article
- PubMed/NCBI
- Google Scholar
14. Carrard N, MacArthur J, Leahy C, Soeters S, Willetts J. The water, sanitation and hygiene gender equality measure (WASH-GEM): Conceptual foundations and domains of change. Womens Stud Int Forum. 2022;91:102563.
- View Article
- Google Scholar
15. Gonzalez D, Abdel Sattar R, Budhathoki R, Carrard N, Chase RP, Crawford J, et al. A partnership approach to the design and use of a quantitative measure: Co-producing and piloting the WASH gender equality measure in Cambodia and Nepal. Development Studies Research. 2022;9(1):142–58.
- View Article
- Google Scholar
16. MacArthur J, Chase RP, Gonzalez D, Kozole T, Nicoletti C, Toeur V, et al. Investigating impacts of gender-transformative interventions in water, sanitation, and hygiene: Structural validity, internal reliability and measurement invariance of the water, sanitation, and hygiene–Gender equality measure (WASH-GEM). PLOS Water. 2024;3(10):e0000233.
- View Article
- Google Scholar
17. Boateng GO, Neilands TB, Frongillo EA, Melgar-Quiñonez HR, Young SL. Best Practices for Developing and Validating Scales for Health, Social, and Behavioral Research: A Primer. Front Public Health. 2018;6:149. pmid:29942800
- View Article
- PubMed/NCBI
- Google Scholar
18. Fleuren BPI, van Amelsvoort LGPM, Zijlstra FRH, de Grip A, Kant Ij. Handling the reflective-formative measurement conundrum: a practical illustration based on sustainable employability. J Clin Epidemiol. 2018;103:71–81. pmid:30031210
- View Article
- PubMed/NCBI
- Google Scholar
19. Jarvis CB, MacKenzie SB, Podsakoff PM. A Critical Review of Construct Indicators and Measurement Model Misspecification in Marketing and Consumer Research. J Consum Res. 2003;30(2):199–218.
- View Article
- Google Scholar
20. Diamantopoulos A, Riefler P, Roth KP. Advancing formative measurement models. Journal of Business Research. 2008;61(12):1203–18.
- View Article
- Google Scholar
21. Hair JF, Hult GT, Ringle C, Sarstedt M. A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM). Sage; 2017.
22. Draucker CB, Martsolf DS, Poole C. Developing distress protocols for research on sensitive topics. Arch Psychiatr Nurs. 2009;23(5):343–50. pmid:19766925
- View Article
- PubMed/NCBI
- Google Scholar
23. Scott K, Ummer O, LeFevre AE. The devil is in the detail: reflections on the value and application of cognitive interviewing to strengthen quantitative surveys in global health. Health Policy Plan. 2021;36(6):982–95. pmid:33978729
- View Article
- PubMed/NCBI
- Google Scholar
24. Allmark P, Boote J, Chambers E, Clarke A, McDonnell A, Thompson A, et al. Ethical Issues in the Use of In-Depth Interviews: Literature Review and Discussion. Res Ethics. 2009;5(2):48–54.
- View Article
- Google Scholar
25. Streib G. Privacy in the research interview: Issues related to design, field tactics and analysis. J Comp Fam Stud. 1973;4:276–85.
- View Article
- Google Scholar
26. Hair JF, Hult GTM, Ringle CM, Sarstedt M, Danks NP, Ray S. Partial Least Squares Structural Equation Modeling (PLS-SEM) Using R. Structural Equation Modeling: A Multidisciplinary Journal. Cham: Springer International Publishing; 2021. https://doi.org/10.1007/978-3-030-80519-7
27. Hair J, Alamer A. Partial Least Squares Structural Equation Modeling (PLS-SEM) in second language and education research: Guidelines using an applied example. Research Methods in Applied Linguistics. 2022;1(3):100027.
- View Article
- Google Scholar
28. Diamantopoulos A, Winklhofer HM. Index Construction with Formative Indicators. Journal of Marketing Research. 2001;38:269–77.
- View Article
- Google Scholar
29. Mertens DM. Transformative research and evaluation. New York: Gilford; 2009.
30. Ramalingam B. Tools for Knowledge and Learning. Overseas Development Institute; 2006. Available: http://www.odi.org/sites/odi.org.uk/files/odi-assets/publications-opinion-files/188.pdf
31. MacArthur J, Carrard N, Willetts J. Exploring gendered change: concepts and trends in gender equality assessments. Third World Quarterly. 2021;42(9):2189–208.
- View Article
- Google Scholar
32. Mertens DM, Catsambas T. Ethical Practice through a Transformative Lens and Methodological Implications in Evaluation. Ethics for Evaluation. 2021. p. 164–87. https://doi.org/10.4324/9781003247234-11

[ref1] 1. Labott SM, Johnson TP, Fendrich M, Feeny NC. Emotional risks to respondents in survey research. J Empir Res Hum Res Ethics. 2013;8(4):53–66. pmid:24169422
View Article
PubMed/NCBI
Google Scholar

[2] View Article

[3] PubMed/NCBI

[4] Google Scholar

[ref2] 2. Labott SM, Johnson TP, Feeny NC, Fendrich M. Evaluating and addressing emotional risks in survey research. Surv Pract. 2016;9(1):1–9.
View Article
Google Scholar

[6] View Article

[7] Google Scholar

[ref3] 3. Development Initiatives. Household surveys factsheet. 2017. Available: http://devinit.org/wp-content/uploads/2017/07/Key-facts-on-household-surveys.pdf
View Article
Google Scholar

[9] View Article

[10] Google Scholar

[ref4] 4. Kumar K. Conducting mini surveys in developing countries. 2006.

[ref5] 5. White H, Raitzer DA. Impact Evaluation of Development Interventions: A Practical Guide. 2017. Available: https://www.adb.org/sites/default/files/publication/392376/impact-evaluation-development-interventions-guide.pdf
View Article
Google Scholar

[13] View Article

[14] Google Scholar

[ref6] 6. Jorm AF, Kelly CM, Morgan AJ. Participant distress in psychiatric research: a systematic review. Psychol Med. 2007;37(7):917–26. pmid:17224097
View Article
PubMed/NCBI
Google Scholar

[16] View Article

[17] PubMed/NCBI

[18] Google Scholar

[ref7] 7. Parslow RA, Jorm AF, O’Toole BI, Marshall RP, Grayson DA. Distress experienced by participants during an epidemiological survey of posttraumatic stress disorder. J Trauma Stress. 2000;13(3):465–71. pmid:10948486
View Article
PubMed/NCBI
Google Scholar

[20] View Article

[21] PubMed/NCBI

[22] Google Scholar

[ref8] 8. Walker EA, Newman E, Koss M, Bernstein D. Does the study of victimization revictimize the victims? Gen Hosp Psychiatry. 1997;19(6):403–10. pmid:9438184
View Article
PubMed/NCBI
Google Scholar

[24] View Article

[25] PubMed/NCBI

[26] Google Scholar

[ref9] 9. Dugosh KL, Festinger DS, Croft JR, Marlowe DB. Measuring coercion to participate in research within a doubly vulnerable population: initial development of the coercion assessment scale. J Empir Res Hum Res Ethics. 2010;5(1):93–102. pmid:20235867
View Article
PubMed/NCBI
Google Scholar

[28] View Article

[29] PubMed/NCBI

[30] Google Scholar

[ref10] 10. Griffin MG, Resick PA, Waldrop AE, Mechanic MB. Participation in trauma research: is there evidence of harm? J Trauma Stress. 2003;16(3):221–7. pmid:12816333
View Article
PubMed/NCBI
Google Scholar

[32] View Article

[33] PubMed/NCBI

[34] Google Scholar

[ref11] 11. Kassam-Adams N, Newman E. The reactions to research participation questionnaires for children and for parents (RRPQ-C and RRPQ-P). Gen Hosp Psychiatry. 2002;24(5):336–42. pmid:12220800
View Article
PubMed/NCBI
Google Scholar

[36] View Article

[37] PubMed/NCBI

[38] Google Scholar

[ref12] 12. Newman E, Willard T, Sinclair R, Kaloupek D. Empirically supported ethical research practice: the costs and benefits of research from the participants’ view. Account Res. 2001;8(4):309–29. pmid:12481796
View Article
PubMed/NCBI
Google Scholar

[40] View Article

[41] PubMed/NCBI

[42] Google Scholar

[ref13] 13. Joffe S, Cook EF, Cleary PD, Clark JW, Weeks JC. Quality of informed consent: a new measure of understanding among research subjects. J Natl Cancer Inst. 2001;93(2):139–47. pmid:11208884
View Article
PubMed/NCBI
Google Scholar

[44] View Article

[45] PubMed/NCBI

[46] Google Scholar

[ref14] 14. Carrard N, MacArthur J, Leahy C, Soeters S, Willetts J. The water, sanitation and hygiene gender equality measure (WASH-GEM): Conceptual foundations and domains of change. Womens Stud Int Forum. 2022;91:102563.
View Article
Google Scholar

[48] View Article

[49] Google Scholar

[ref15] 15. Gonzalez D, Abdel Sattar R, Budhathoki R, Carrard N, Chase RP, Crawford J, et al. A partnership approach to the design and use of a quantitative measure: Co-producing and piloting the WASH gender equality measure in Cambodia and Nepal. Development Studies Research. 2022;9(1):142–58.
View Article
Google Scholar

[51] View Article

[52] Google Scholar

[ref16] 16. MacArthur J, Chase RP, Gonzalez D, Kozole T, Nicoletti C, Toeur V, et al. Investigating impacts of gender-transformative interventions in water, sanitation, and hygiene: Structural validity, internal reliability and measurement invariance of the water, sanitation, and hygiene–Gender equality measure (WASH-GEM). PLOS Water. 2024;3(10):e0000233.
View Article
Google Scholar

[54] View Article

[55] Google Scholar

[ref17] 17. Boateng GO, Neilands TB, Frongillo EA, Melgar-Quiñonez HR, Young SL. Best Practices for Developing and Validating Scales for Health, Social, and Behavioral Research: A Primer. Front Public Health. 2018;6:149. pmid:29942800
View Article
PubMed/NCBI
Google Scholar

[57] View Article

[58] PubMed/NCBI

[59] Google Scholar

[ref18] 18. Fleuren BPI, van Amelsvoort LGPM, Zijlstra FRH, de Grip A, Kant Ij. Handling the reflective-formative measurement conundrum: a practical illustration based on sustainable employability. J Clin Epidemiol. 2018;103:71–81. pmid:30031210
View Article
PubMed/NCBI
Google Scholar

[61] View Article

[62] PubMed/NCBI

[63] Google Scholar

[ref19] 19. Jarvis CB, MacKenzie SB, Podsakoff PM. A Critical Review of Construct Indicators and Measurement Model Misspecification in Marketing and Consumer Research. J Consum Res. 2003;30(2):199–218.
View Article
Google Scholar

[65] View Article

[66] Google Scholar

[ref20] 20. Diamantopoulos A, Riefler P, Roth KP. Advancing formative measurement models. Journal of Business Research. 2008;61(12):1203–18.
View Article
Google Scholar

[68] View Article

[69] Google Scholar

[ref21] 21. Hair JF, Hult GT, Ringle C, Sarstedt M. A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM). Sage; 2017.

[ref22] 22. Draucker CB, Martsolf DS, Poole C. Developing distress protocols for research on sensitive topics. Arch Psychiatr Nurs. 2009;23(5):343–50. pmid:19766925
View Article
PubMed/NCBI
Google Scholar

[72] View Article

[73] PubMed/NCBI

[74] Google Scholar

[ref23] 23. Scott K, Ummer O, LeFevre AE. The devil is in the detail: reflections on the value and application of cognitive interviewing to strengthen quantitative surveys in global health. Health Policy Plan. 2021;36(6):982–95. pmid:33978729
View Article
PubMed/NCBI
Google Scholar

[76] View Article

[77] PubMed/NCBI

[78] Google Scholar

[ref24] 24. Allmark P, Boote J, Chambers E, Clarke A, McDonnell A, Thompson A, et al. Ethical Issues in the Use of In-Depth Interviews: Literature Review and Discussion. Res Ethics. 2009;5(2):48–54.
View Article
Google Scholar

[80] View Article

[81] Google Scholar

[ref25] 25. Streib G. Privacy in the research interview: Issues related to design, field tactics and analysis. J Comp Fam Stud. 1973;4:276–85.
View Article
Google Scholar

[83] View Article

[84] Google Scholar

[ref26] 26. Hair JF, Hult GTM, Ringle CM, Sarstedt M, Danks NP, Ray S. Partial Least Squares Structural Equation Modeling (PLS-SEM) Using R. Structural Equation Modeling: A Multidisciplinary Journal. Cham: Springer International Publishing; 2021. https://doi.org/10.1007/978-3-030-80519-7

[ref27] 27. Hair J, Alamer A. Partial Least Squares Structural Equation Modeling (PLS-SEM) in second language and education research: Guidelines using an applied example. Research Methods in Applied Linguistics. 2022;1(3):100027.
View Article
Google Scholar

[87] View Article

[88] Google Scholar

[ref28] 28. Diamantopoulos A, Winklhofer HM. Index Construction with Formative Indicators. Journal of Marketing Research. 2001;38:269–77.
View Article
Google Scholar

[90] View Article

[91] Google Scholar

[ref29] 29. Mertens DM. Transformative research and evaluation. New York: Gilford; 2009.

[ref30] 30. Ramalingam B. Tools for Knowledge and Learning. Overseas Development Institute; 2006. Available: http://www.odi.org/sites/odi.org.uk/files/odi-assets/publications-opinion-files/188.pdf

[ref31] 31. MacArthur J, Carrard N, Willetts J. Exploring gendered change: concepts and trends in gender equality assessments. Third World Quarterly. 2021;42(9):2189–208.
View Article
Google Scholar

[95] View Article

[96] Google Scholar

[ref32] 32. Mertens DM, Catsambas T. Ethical Practice through a Transformative Lens and Methodological Implications in Evaluation. Ethics for Evaluation. 2021. p. 164–87. https://doi.org/10.4324/9781003247234-11

Figures

Abstract

Introduction

Methods

Item identification and refinement

Content and face validity

Validation datasets and sampling

Measure validation

Ethical considerations

Results

Sociodemographic characteristics

Distress measure items

Intra-item and measure collinearity

PLS-SEM model and distress measure validity

Scoring

Construct validity through known groups

Discussion

Summary and interpretation

During deployment

After deployment

Within reporting

Limitations and future research

Conclusions

Supporting information

S1 Data. Deidentified Cleaned Dataset.

S1 Analysis Code. R-Studio Analysis Code.

S1 Fig. Correlation matrix of all 17 WASH-GEM themes and the four distress items.

S2 Fig. PLS-SEM model with reflective measures as outcome variables.

S1 Table. Similar distress measures.

S1 Checklist. Inclusivity in global research checklist.

Acknowledgments

References