Figures
Abstract
Background
A growing body of research indicates that sex (biological) and gender (sociocultural) influence health through a variety of distinct mechanisms. Sex- and Gender-Based Analysis (SGBA) techniques could examine these influences, however, there is a lack of nuanced and easily implementable measurement tools for health research. To address this gap, we created the Sex- and Gender-Based Analysis Tool – 5 item (SGBA-5).
Objectives
This research aims to assess the validity and reliability of the SGBA-5 for use in health sciences research where sex or gender are not primary variables of interest.
Methods
A Delphi consensus study was conducted with Canadian researchers (n = 14). The Delphi experts rated the validity of each SGBA-5 item on a 5-point Likert scale each round, receiving summary statistics of other experts’ responses after the first round. A conservative threshold for consensus agreement (75% rating an item 4+ of 5) was used given the novelty of this scale’s items. Reliability was assessed through a two-armed test-retest study. The university student arm (n = 89) was conducted in-person (on paper), and the older adult arm (n = 71) was conducted online (digitally).
Results
The Delphi study ended after three rounds; experts reached consensus agreement on the validity of the biological sex item of the SGBA-5 (93%) and consensus non-agreement on each of the gendered aspect of health items (identity: 64%, expression: 64%, roles: 50%, relations: 57%). Both the student arm (sex item: , gendered items:
) and the older adult arm (sex item:
, gendered items:
) of the test-retest study indicated that all items were reliable.
Citation: Putman A, Cole A, Dogra S (2025) Initial validity and reliability testing of the SGBA-5. PLoS One 20(5): e0323834. https://doi.org/10.1371/journal.pone.0323834
Editor: Pasyodun Koralage Buddhika Mahesh, Ministry of Health, Sri Lanka, SRI LANKA
Received: April 1, 2024; Accepted: April 16, 2025; Published: May 16, 2025
Copyright: © 2025 Putman et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: Data cannot be shared publicly because of the privacy concern of the potential for participant re-identification from disaggregated anonymized data. Data are available from the authors upon approval from the Ontario Tech Research Ethics Board (contact via researchethics@ontariotechu.ca) to release data to researchers who meet the criteria for access to confidential data.
Funding: The authors received no specific funding for this work.
Competing interests: The authors have declared that no competing interests exist.
Introduction
In health sciences research, Sex- and Gender-Based Analysis (SGBA) is an umbrella term for the collection of research methods and analytic techniques that can provide insight into how sex and gender interact with health. In this context, sex is defined as biological characteristics in living beings that relate to sexual reproduction, and gender is defined as the socio-cultural expectations, roles, expressions, and identities that are associated with women, men, and gender diverse people [1]. Many leading organizations - including the Canadian tri-council funding agencies, the National Institutes of Health (US), the European Association of Science Editors, and the World Health Organization - have encouraged and prioritized systematic inclusion of SGBA. In fact, all of them have created specific policies promoting the inclusion of SGBA in health research [2–9].
Despite these collective efforts, the integration of SGBA into health research has not been universal. For example, in 2009 the Canadian Institutes of Health Research (CIHR) began implementation of their 10-year SGBA action plan; this plan required that all grant applicants had to complete SGBA training modules prior to submitting their applications [2,10,11]. Haverfield and Tannenbaum (2021) analyzed more than 39,000 grant applications to the CIHR from 2011 to 2019 and found that while the inclusion of sex-based analysis increased from 22% of grant applications in 2011 to 83% in 2019, the inclusion of gender-based analysis went from 12% in 2011 to 33% in 2019 [10]. The relatively small increase in the inclusion of gender-based analysis into grant applications suggests that researchers are finding it harder to integrate gender-based analysis into their study designs than sex-based analyses. It was further noted that despite completing the required SGBA training modules, a portion of both the grant applicants and the grant evaluators demonstrated a lack of comprehension of core SGBA principles, such as conflating sex and gender [10]. This gap between the goal of SGBA implementation by funding agencies and research entities, and the integration by researchers in individual studies and grant applications is influenced by many factors, including the limited number of valid and reliable tools that can be used for SGBA.
Several measurement tools have been created over the past 50 years that aim to assess the effects of sex and/or gender in a health context. However, there are several limitations to these measurement tools. Common limitations are that these tools can be lengthy, invasive, overly reliant on stereotypes, offensive, demeaning, or based on outdated conceptions of sex and gender [6,12,13]. A more detailed analysis of strengths and limitations of existing tools is available elsewhere [14]. More specifically, there is currently a lack of valid and reliable tools that are concise while allowing for differentiation between biological sex and gender as a social determinant of health in research where sex or gender are not the primary focus. For example, in 2022 the National Academy of Sciences (US) recommended that researchers use a two-step method of nominal categorical responses where participants report their sex-assigned at birth and gender identity separately [15]. The recommendation of the two-step method is supported by evidence of its validity and reliability for use in population-level and census questionnaires [15–17]. Unfortunately, these kind of categorical indicator variables can only provide meaningful differentiation between sex-based and gender-based effects in studies that have very large sample sizes, an inherent limitation of using disaggregation analyses. Disaggregation-based analyses do not allow for the level of detailed insight that is possible from more complex conceptualizations of sex and gender – which require more intricate scale measures or qualitative work [18]. Thus, the use of nominal categories and disaggregated analysis in SGBA can provide some insight into differences between sex and/or gender categorizations in large, population-scale surveys, they cannot provide the level of granular detail needed for investigation of the multidimensional aspects and within-category variation that is inherent in social constructs like gender [6].
To address this gap, we created the Sex- and Gender-Based Analysis Tool – 5 Item (SGBA-5). This tool is proposed as one way to conduct SGBA in health research studies based on current evidence of how sex and gender influence health but is certainly not the only way to do so. There are a multitude of ways to model and conceptualize sex and gender, and how they impact health; the SGBA-5 is one of many such tools that are needed to address the breadth of potential SGBA implementations in the health sciences [2,6,12]. Thus, the purpose of this study was to assess the validity and reliability of the SGBA-5 for use in health sciences research where sex or gender are not primary variables of interest.
Methods
The first iterations of the SGBA-5 were developed alongside a thorough literature review and small-group feedback. The more formal assessment of the SGBA-5 began with a Delphi expert consensus study on the SGBA-5’s validity, and then continued with a test-retest study to assess its reliability. These steps and their relation to the steps in creation of a novel measurement tool (scale) are visualized in Fig 1 [18–20].
This diagram shows the steps involved in novel scale creation, beginning with Item Development and then progressing to an ongoing cycle of Scale Development and Scale Assessment. This cycle of Scale Development and Scale Assessment continues alongside use of the scale to ensure the scale’s validity and reliability as well as to assess the suitability of the scale’s use in different contexts. The diagram also indicates the steps of novel scale development associated with the initial design and testing of the SGBA-5 described in this paper.
Initial design and development
As is typical with the development of a scale that does not have direct comparator scales to draw upon, potential items for inclusion in the SGBA-5 were drawn from extensive reviews of the peer-reviewed and grey literature relating to biological sex, gendered aspects of health, measurement of multidimensional health impacts, continuous scale creation, and scale validation methodologies [8,12,15,18,19,21–29]. One of the key takeaways from this earlier investigation was that at present, while there is theoretical backing for a variety of ways in which to operationalize measurement of aspects of gender that may influence health [2,6,12], there is not a similar theoretical background for the measurement of different aspects of biological sex in general health research.
The SGBA-5 consists of a categorical option for biological sex (male/female/intersex) and four gendered aspect of health constructs measured on a visual analogue scale that depicts a feminine-masculine continuum. The wording and type of measurement for each item that is included in SGBA-5 is presented in Table 1 and is included in S1 File. The categorical option of biological sex was selected as the authors could not find a theoretical foundation on which to base a more complex understanding of biological sex that is both appropriate for use in a general population or that could be answered on a questionnaire without a priori knowledge of specific biological test results. A short list of potential gendered aspects of health to include in the SGBA-5 was derived from a selection of the existing academic literature, white papers, and SGBA policies that was identified in the initial background literature review [2,6,8,12,13,26]. The four gender constructs included in the SGBA-5 were chosen because evidence was found in the literature to support their proposed pathway of health impact (gender identity [30,31], gender expression [32,33], gender roles [34,35], and gendered relations[36,37]). Institutionalized gender as a gendered aspect of health was considered but not included in the SGBA-5 as institutional-level impacts are best assessed at a legislative or community level rather than on an individual level [6,12].
As the CIHR Institute of Gender and Health notes in its definitions of sex and gender in the context of health research, “[gender] is not confined to a binary (girl/woman, boy/man) nor is it static; it exists along a continuum and can change over time” [1]. To attempt to capture more of this variation than nominal items alone could provide, feminine – masculine analogue (continuous) scales were used to represent the four gendered aspects of health constructs that were included in the SGBA-5; they are not assumed to have true zero [0] values. We implemented this interpretation constraint onto the SGBA-5’s analogue measures with the aim to mitigate potential over-interpretation of differences between groups or individuals who complete the SGBA-5 as a part of a health study where sex or gender are not primary variables of interest. Despite the limitations of assuming that the analogue measures do not have a true zero value, these measures can still provide more information on group and individual-level variation than would be possible from a nominal categorization item alone [18].
We aimed to design a tool for within study analysis of the impacts that sex or gender have on the study’s primary outcomes. The SGBA-5 is not designed to ‘determine’ any individual participant’s sex or gender, nor is designed to be sensitive enough for research where sex or gender are the primary variable[s] in the study. The SGBA-5 is not a replacement for dedicated work with marginalized sex and gender communities (i.e., trans & nonbinary individuals, intersex persons, etc.) however, the SGBA-5 is designed so that it can be completed by persons of sex and gender minorities as part of studies where sex or gender are not primary variables. The SGBA-5 is designed with the intention to be integrated into clinical trials and research studies, enabling researchers to facilitate the inclusion of multiple dimensions of SGBA into their studies. Completing the SGBA-5 requires one to two minutes, and thus should not be onerous for the participant or researcher to use. Additional information on scale methodology rationale, appropriate application, and interpretation of the SGBA-5 are also included in S1 File.
After developing the first full version of the SGBA-5, we informally presented it to health sciences researchers at a Faculty seminar (n ≈ 30), and at a multi-university journal club meeting (n ≈ 25) to obtain feedback on the face validity of the SGBA-5. The five selected item topics and their respective methods of measurement remained consistent throughout the small-group feedback stage of development, but the presentation and phrasing of those items were updated and iterated upon throughout.
Delphi expert consensus
More rigorous initial evaluation of the SGBA-5’s suitability for use in health research involved a Delphi expert consensus of Canadian health researchers to assess the content validity of the SGBA-5 for within-sample SGBA. Generally, a Delphi study of content validity consists of a minimum of three rounds in which experts independently evaluate the proposed scale item and score it using a Likert scale or pass/fail rating [19,39]. These evaluations are communicated to the Delphi researchers who pool and assess the expert feedback. From the second round onward, the researchers typically provide the Delphi experts with descriptive statistics and/or qualitative summaries from the previous round’s anonymized expert feedback [40,41]. This anonymized feedback provides experts an overview of the other Delphi experts’ opinions which they can use to inform their evaluation of that round. A Delphi study is halted once the experts reach a stable consensus, or if a pre-determined maximum number of rounds has been reached (not used in this study), or if the between-round differences in the Delphi expert’s rating drops below a predetermined threshold (used in this study) [39–41].
The purpose of this Delphi Expert Consensus study was to receive feedback on the construct validity of the SGBA-5’s scale items from a sample of Canadian health sciences researchers (this study’s Delphi experts and the most likely initial users of the SGBA-5). This process provided evidence that each item measures what it is proposed to measure (an item’s content validity) [18,19]. In accordance with the threshold of evidence used to initially select the items for inclusion in the SGBA-5, it was decided a priori that any major changes (i.e., adding a new item, switching from continuous to ordinal measures, etc.) resulting from the Delphi expert’s feedback must reflect evidence in the current health sciences literature.
Participants
Beginning January 10th, 2023, the authors (AP and SD) contacted health science Deans at institutions across Canada and requested that they recommend 1–2 potential participants. Specifically, we asked for them to identify researchers who had expertise in conducting health research studies with human participants, and who had been involved in CIHR-funded research in the past (to ensure familiarity with the CIHR sex and gender definitions in health research). By the end of participant recruitment on Feb 14th, 2023, 32 researchers had been recommended to the authors as potential experts for the Delphi, 17 of whom consented to participate in the study. Fourteen experts (82%) participated in all three rounds. The Delphi experts represented institutions spanning seven provinces and one territory. All participants provided written consent prior to participation in the study. The Delphi study was reviewed by and conducted in compliance with the regulations of the Ontario Tech University Research Ethics Board (REB # 17153).
Procedure
In the first round, participants were emailed a link to a survey in which they were presented each of the items from the SGBA-5 as well as the instruction page that would be provided to researchers administering the SGBA-5. Participants were asked to score each scale item from the SGBA-5 on a 1–5 Likert scale, with a score of 1 being “This question is not a valid measure of [scale item] for SGBA in health research”, and a score of 5 being “This question is a valid measure of [scale item] for SGBA in health research”. The participants were also able to provide optional written feedback on the scale items’ construct validity individually, or the SGBA-5 as a whole.
In the second and third rounds, participants were asked to conduct the same rating exercise and were also provided with optional supplementary documentation that further explained the rationale for the SGBA-5, as well as a short Question & Answer-style summary of the SGBA-5’s creation, and appropriate use cases. In these rounds, participants were shown the median and interquartile ranges of the Likert scores from the previous round when rating each scale item. Furthermore, minor adjustments were made to the SGBA-5’s formatting or phrasing between rounds based on comments from the previous round.
Statistical analysis
The threshold for consensus agreement was set a priori to 75%. That is, the expert rating of each item from the SGBA-5 had to be: 1) rated at least 4 out of 5 on the Likert scale, and 2) have a test demonstrating inter-round answer stability at
after a minimum of three rounds. Alternatively, the Delphi study would also be halted if the between-round difference in the coefficient of variance (
) was
, which would be an indication that the researchers have reached a stable non-agreement consensus. These thresholds are in line with Delphi Expert Consensus best practices and intentionally use more conservative target measures to avoid overestimation of expert consensus [18,19,39–41].
The statistical tests for consensus () and overall summary statistics (median, IQR) from and between each round were calculated after the completion of each round in the Delphi study.
Test-retest study
A test-retest component aimed to assess the reliability of each scale item in the SGBA-5 in two populations.
Participants
The test-retest reliability of the SGBA-5 was assessed in two separate populations, university students and older adults. These two populations were recruited and assessed separately in an in-person student arm, and a virtually administered older adult arm.
Participants in the student arm were recruited from the student participant pools of the Kinesiology and Psychology programs at Ontario Tech University between September 4th, 2023, and November 13th, 2023. Inclusion was limited to those who were a current student at Ontario Tech University, those able to come to two in-person sessions, and those able to communicate in English. The older adult arm was recruited between September 11th, 2023, and February 2nd, 2024 through Ontario Tech University’s Age-Friendly Campus email newsletter and through Facebook advertisements targeting older adults in the Durham Region. Participants were eligible for the older adult arm if they were 55 years of age or older, and were capable of completing, or had access to assistance in completing, the SGBA-5 via a web-hosted survey (the URL was emailed to participants after they indicated to the research team that they were interested in participating). All participants in both arms provided written consent prior to participation in the study. The test-retest study was approved by and conducted in compliance with the regulations of the Ontario Tech University Research Ethics Board (REB # 17477).
Procedure
Eligible participants completed the SGBA-5 twice, at least two weeks apart; they also completed a demographic questionnaire prior to completing the SGBA-5 for the first time. For the students, the SGBA-5 was completed on paper in-person; for the older adults, the SGBA-5 was completed online.
Statistical analysis
The primary test statistics used to evaluate the test-retest reliability in each sample were Cohen’s kappa () coefficient of agreement for the categorical sex variable and intraclass correlation coefficient of agreement (
) for the four gendered aspect of health continuum variables at
. The threshold for determining appropriate reliability of the SGBA-5 for use in research were set a priori as
and
[18,42–45]. P-values are reported but were not used as a threshold to determine scale item reliability because the magnitude of the
and
coefficients (how similar a participant’s scores are between the test and retest) more directly assess a measurement tool’s reliability than calculating the probability that that tool’s results could be due to chance [18,44]. Secondary reliability analyses of the tool were conducted to quantify the standard error of measurement (SEM) for each of the four gendered aspect of health continuums, and sensitivity analysis were conducted using the sample’s demographic variables. SEM results are presented as percentages of the feminine to masculine continuum used to quantify the four gendered aspects of health addressed in this tool. These SEM percentages reflect the minimum difference (measured in % of the full continuum) between two observations of the same individual that would be needed to find a difference over time that is unlikely to be explained entirely by error [44]. Since this validity and reliability analysis is not assessing whether there are meaningful changes in participant’s experiences of these gendered aspects overtime, the SEM percentages are instead presented to help demonstrate the typical amounts of variation that could be expected in an individual participant’s answers to the gendered aspect of health scale items. This allows for more confidence in identification of when there are meaningful differences between participants, i.e., if the difference between response of participant 1 and participant 2 is larger than the SEM of that scale item, then we can suggest that there is likely a true difference between participants for that scale item.
All statistical analyses were conducted in the R statistical programming language (version 4.3.1: Beagle Scouts) along with the packages tidyverse (version 2.0.0), irr (version 0.84.1), tableone (version 0.13.2), and ggpubr (version 0.6.0) [46–50].
Sample size
The target minimum sample size of 62 participants per arm completing the first (test) component of the study was derived by first plotting predicted minimum sample sizes using the Intraclass Transformation method and Optimal Design Approximation methods through a range of predicted reliability coefficients, confidence intervals, and minimum acceptable reliability coefficients [18,51,52]. Then, the more conservative estimate (n = 42), was identified for a predicted reliability coefficient of 0.85 (CI from 0.80 to 0.90). This number was rounded up to 50 to mitigate potential overestimation of the predicted reliability and then after accounting for a potential dropout rate of 25%, the target minimum sample size for the test-retest study was set as n = 62 [18].
Results
Initial design and development
The initial design process and validity testing culminated in the creation of the first full version of the SGBA-5. This version used five items (one biological sex item, four gendered aspect of health items) that had similar phrasings to those presented in Table 1 (which reports the phrasings used in version 1.0 of the SGBA-5. Version 1.0 of the SGBA-5 can be found in S1 File.). Small group feedback from health sciences researchers suggested that the SGBA-5 had good face validity for use in health sciences research where sex or gender are not the primary focus. The SGBA-5 was then presented to the Delphi experts to assess its content validity.
Delphi expert consensus
Summary statistics from the Delphi study are presented in Table 2. The Delphi study was concluded after three rounds. The experts reached a consensus on all five items; only the categorical sex question met the a priori threshold for agreement of 75% of experts rating the item 4 or 5 out of 5 (93% agreement, median = 5.0, IQR = 0.0). Fig 2 is a stacked bar plot that shows the distribution of ratings for each scale item in the final round of the study. The optional feedback provided by the Delphi experts did not provide new insight into, or constructive critique, of the SGBA-5.
Test-retest study
Of the eligible student participants that completed the test round of the test-retest study (n = 102), 87% (n = 89) completed both rounds. Among older adults, 89 completed the first round of the test-retest study, and 80% (n = 71) completed both rounds. Table 3 displays the demographic characteristics for both the university student and older adult arms of the test-retest study.
Results from the university students who completed both rounds of the test-retest study are presented in Table 4. There was complete agreement between test and retest answers for the categorical sex item . All four gendered aspects of health had
scores greater than the a priori threshold of
, ranging from the gender roles item which had an
to a high of
for the gender identity item.
Sensitivity analyses of the student arm’s gendered aspects of health coefficients showed no significant differences when grouped by demographic characteristics, and all subgroup
coefficients were greater than the 0.7 a priori threshold. More detailed results for each sensitivity analysis are presented in the S1 Table.
Results from the older adult arm of the test-retest study are presented in Table 5. There was perfect agreement between test and retest answers for the categorical sex item . All four gendered aspects of health had
scores greater than the a priori threshold of
, which ranged from the gender roles item which had an
to a high of
for the gendered relations item. Sensitivity analyses of the older adult arm of the test-retest study stratified by self-reported demographic characteristics had all sub-groups scoring higher than the a priori threshold of
, which are detailed in the S2 Table.
Discussion
The purpose of the work presented in this paper was to conduct initial assessment of a novel tool for SGBA. The tool was designed to provide more meaningful insight into the variation and multidimensionality of sex and gender than what is possible using categorical option tools, without also significantly increasing the time commitment and workload for the researchers who use it. A strength of the proposed SGBA-5 tool is that the responses of gender-diverse persons can be included and analyzed without necessitating the large sample sizes (often an n of 500 or more assuming a 1% proportion) that are required to analyze small-proportion nominal groups.
This paper reports the first formal validity and reliability testing of the SGBA-5. Specifically, our testing was designed to determine whether the SGBA-5 was robust enough to recommend further assessment and trial use in health sciences research studies in which sex or gender are not primary variables. The Delphi expert consensus study on the SGBA-5’s validity provided strong support for the biological sex item. While the gendered aspect of health items did not meet the conservative a priori threshold for consensus agreement (75%), the items were not rejected either. The test-retest study of the SGBA-5’s reliability demonstrated strong reliability for all items in both population arms, which was further bolstered by the reliability coefficients robustness in secondary analysis, and narrow SEM margins. To our knowledge, this is the first SGBA tool that has been created with the specific intention of enabling researchers to integrate SGBA into health sciences research when sex or gender are not primary variables of interest.
The Delphi expert consensus study of health researchers from across Canada rated the validity of the SGBA-5’s biological sex item as being highly valid , which is consistent with what we expected since the use of a nominal-categorical item to report biological sex is already the most commonly used and accepted way of incorporating biological sex into health research outside of specialized areas of sex-based research [2,6,8,11,12]. The Delphi expert’s ratings of the gendered aspects of health items that used a feminine-masculine continuum did not meet our intentionally conservative a priori consensus target of at least 75% of validity ratings being 4 or 5 out of 5, but all of the items had median ratings of > 3 out of 5 with at least 50% of ratings being 4 or 5 out of 5, both of which are thresholds that have been used to define consensus in Delphi expert consensus studies [18,40,41,53–56]. This suggests that while experts were not unanimous in their endorsement of the use the gendered aspects of health items utilizing a feminine-masculine continuum, they did not view any of the items as invalid either. This mixed response from the Delphi experts may be reflective of the broader inconsistencies in health researchers’ definitions of sex and gender in a health science context. As Haverfield and Tannenbaum noted in their evaluation of nearly 40,000 grant applications, both researchers and grant evaluators had difficulty in consistently applying the principles of SGBA despite having to complete mandatory education modules on the topic [10]. It is perhaps unsurprising that the health researchers in our study, whose primary focus is not sex or gender research, were hesitant to give a higher validity rating to scale items that operationalize gendered aspects of health. It is also possible that some of the Delphi expert’s validity ratings may have been negatively influenced by misunderstanding of the purpose, use cases, or scope of the SGBA-5, particularly given the current diversity of understandings and conceptualizations of gender [2,6,8,10,26,57–60].
The test-retest reliability study demonstrated strong reliability among all SGBA-5 items and across both study arms. The student and older adult arms had perfect test-retest reliability for the biological sex item , and the reliability coefficients for the gendered aspect of health continuum items were all well above the a priori acceptable reliability threshold of
at an
. Both the student and older adult samples were relatively homogeneous, and thus should not be generalized to all university students or all older adults; however, the reliability results for all the gendered aspect of health items were further supported by all the sub-group coefficients in the demographic-based sensitivity analyses surpassing the a priori reliability coefficient threshold as well. Additionally, calculated SEM percentages from the test-retest data (ranging from 9.2% for gender identity item in the student arm to 17.8% for the older adult gender roles item) support the supposition that an analogue continuum measure can provide more precise and detailed information on the variation that occurs within gendered aspects of health than a nominal or most ordinal measures could.
The SGBA-5 can allow health sciences researchers to incorporate SGBA more easily into their study designs where sex and gender are not the primary focus. The SGBA-5 is designed to be easily added into existing demographic questionnaires or other pre-study procedures already used by researchers. The tool on paper takes up a maximum of one page and takes up a similar amount of space when administered digitally. When combined with the SGBA-5’s quick time-to-completion (1–2 minutes) it is our hope that the SGBA-5 represents a more implementable option for researchers who want to be incorporating more detailed SGBA into their work. The SGBA-5 provides researchers with a way to measure sex and gender separately beyond the oft used categorical sex and gender tick box options (which are unlikely to be differentiable unless the study’s sample size is very large). The SGBA-5 can provide researchers with descriptive insights into their sample (such as whether the sample has an uneven distribution across one or more of the gendered aspects of health) as well as allowing for the assessment of potential confounding occurring between biological sex, the four measured gendered aspects of health, and the study’s primary outcome[s] of interest. A more detailed example data analysis using simulated SGBA-5 and outcome measures can be found at https://github.com/putman-a/SGBA-5_example_analysis [38].
The strengths of the validity assessment of the SGBA-5 include the breadth of researcher insights in the Delphi validity from recruiting experts from institutions across Canada, and the more conservative thresholds for consensus used when assessing the SGBA-5’s validity. The strengths of the test-retest reliability assessment include having conducted test-retest with two separate populations, using two different mediums, and having conducted sensitivity analyses of the reliability coefficients across sample demographics. A potential limitation of the Delphi expert consensus study was the method of expert selection. The research team attempted to mitigate self-selection bias from the experts by contacting health sciences Deans of Canadian institutions and requesting that they nominate a researcher to participate, but this type of sampling does not necessarily generate a representative sample of the entire population of Canadian health researchers who could have provided expertise for this Delphi study. Further, this Delphi study’s results showed that the experts’ assessments of the SGBA’s validity quickly reached stability between rounds (all between the 2nd and 3rd rounds), which may suggest that a larger quantity of experts or experts with more diverse opinions could provide more insight into the SGBA-5’s validity. Additionally, while the test-retest study being conducted with two populations is a strength, it is important to caution against overgeneralization of these results as the nature of the online survey given to the older adult sample only includes those who could comfortably complete the questionnaire on an internet enabled device, and the university student population was recruited from just two programs (Kinesiology and Psychology) which may not be representative of Canadian university students in general. Further evaluation of the SGBA-5 should investigate the different aspects of scale validity and reliability across diverse population groups.
Conclusion
To our knowledge, this paper marks the first validity and reliability testing of a novel tool designed for implementation of SGBA in health sciences research where sex or gender are not primary variables. We found that all five items in the SGBA-5 were reliable, that the biological sex item was deemed valid by the Delphi expert panel, and that the measurement of the gendered aspects of health items using a feminine-masculine continuum may be valid for use in health sciences research. Our testing of the SGBA-5 has shown promise for further development and applications of the SGBA-5.
Supporting information
S1 File. Sex- and gender-based analysis tool 5-item (v1.0).
https://doi.org/10.1371/journal.pone.0323834.s001
(PDF)
S1 Table. Sensitivity analyses of gendered aspects of health test-retest reliability coefficients: Student arm.
https://doi.org/10.1371/journal.pone.0323834.s002
(DOCX)
S2 Table. Sensitivity analyses of gendered aspects of health test-retest reliability coefficients: Older adult arm.
https://doi.org/10.1371/journal.pone.0323834.s003
(DOCX)
References
- 1.
Canadian Institutes of Health Research. What is gender? What is sex? In: Canadian Institutes of Health Research. 10 Jan 2014 [cited 27 Sep 2022]. Available: https://cihr-irsc.gc.ca/e/48642.html
- 2.
Canadian Institutes of Health Research Government of Canada. How to integrate sex and gender into research. 12 Feb 2018 [cited 5 Jan 2023]. Available: https://cihr-irsc.gc.ca/e/50836.html
- 3.
Natural Sciences and Engineering Research Council of Canada. Guide for Applicants: Considering Equity, Diversity and Inclusion in Your Application. NSERC; 2017. Available: https://www.nserc-crsng.gc.ca/_doc/EDI/Guide_for_Applicants_EN.pdf
- 4.
Social Sciences and Humanities Research Council of Canada Government of Canada. Gender-based analysis plus. In: Social Sciences and Humanities Research Council. 2 Dec 2022 [cited 5 Jan 2023]. Available: https://www.sshrc-crsh.gc.ca/about-au_sujet/publications/drr/2021-2022/gba_plus-acs_plus-eng.aspx
- 5.
Health Canada. Health Portfolio Sex and Gender-Based Analysis Policy: Advancing Equity, Diversity and Inclusion. In: Health Canada. 29 Jun 2017 [cited 27 Sep 2022]. Available: https://www.canada.ca/en/health-canada/corporate/transparency/corporate-management-reporting/heath-portfolio-sex-gender-based-analysis-policy.html
- 6.
Johnson J, Greaves L, Repta R. Better science with sex and gender: a primer for health research. Women’s Health Research Network. 2007. https://cewh.ca/wp-content/uploads/2012/05/2007_BetterSciencewithSexandGenderPrimerforHealthResearch.pdf
- 7.
World Health Organization. Strategy for integrating gender analysis and actions into the work of WHO. In: World Health Organization. 2007 [cited 11 Nov 2022]. Available: https://www.who.int/publications-detail-redirect/WHO-FCH-GWH-08.1
- 8. Heidari S, Babor TF, De Castro P, Tort S, Curno M. Sex and gender equity in research: rationale for the SAGER guidelines and recommended use. Res Integr Peer Rev. 2016;1:2. pmid:29451543
- 9.
National Institutes of Health. NIH Inclusion Outreach Toolkit: How to Engage, Recruit, and Retain Women in Clinical Research. In: National Institutes of Health: Office of Research on Women’s Health. 2023 [cited 18 Dec 2023]. Available: https://orwh.od.nih.gov/toolkit/nih-policies-inclusion/guidelines
- 10. Haverfield J, Tannenbaum C. A 10-year longitudinal evaluation of science policy interventions to promote sex and gender in health research. Health Res Policy Syst. 2021;19(1):94. pmid:34130706
- 11.
Institute of Gender and Health. Sex and Gender Training Modules. In: Canadian Institutes of Health Research: Institute of Sex and Gender. Available: https://www.cihr-irsc-igh-isfh.ca/
- 12.
Lowik AJ. Gender & sex in methods & measurement toolkit. Centre for Gender & Sexual Health Equity. 2022.
- 13. Connell R. Gender, health and theory: conceptualizing the issue, in local and world perspective. Soc Sci Med. 2012;74(11):1675–83. pmid:21764489
- 14. Putman A. Initial validity and reliability testing of the SGBA-5. Thesis, Ontario Tech University. 2024. Available: https://ontariotechu.scholaris.ca/handle/10155/1793
- 15.
Committee on Measuring Sex, Gender Identity, and Sexual Orientation, Committee on National Statistics, Division of Behavioral and Social Sciences and Education, National Academies of Sciences, Engineering, and Medicine. Measuring Sex, Gender Identity, and Sexual Orientation. Bates N, Chin M, Becker T, editors. Washington, D.C.: National Academies Press; 2022. p. 26424. https://doi.org/10.17226/26424
- 16.
Federal Committee on Statistical Methodology. Updates on Terminology of Sexual Orientation and Gender Identity Survey Measures. 2020. Available: https://nces.ed.gov/FCSM/pdf/FCSM_SOGI_Terminology_FY20_Report_FINAL.pdf
- 17. Medeiros M, Forest B, Öhberg P. The case for non-binary gender questions in surveys. APSC. 2019;53(1):128–35.
- 18.
Streiner D, Norman G, Cairney J. Health measurement scales: a practical guide to their development and use. 5th ed. ed. Oxford University Press. 2015.
- 19. Boateng GO, Neilands TB, Frongillo EA, Melgar-Quiñonez HR, Young SL. Best practices for developing and validating scales for health, social, and behavioral research: a primer. Front Public Health. 2018;6:149. pmid:29942800
- 20. Keszei AP, Novak M, Streiner DL. Introduction to health measurement scales. J Psychosom Res. 2010;68(4):319–23. pmid:20307697
- 21. Bem SL. The measurement of psychological androgyny. J Consult Clin Psychol. 1974;42(2):155–62. pmid:4823550
- 22. Donnelly K, Twenge JM. Masculine and feminine traits on the bem sex-role inventory, 1993–2012: a cross-temporal meta-analysis. Sex Roles. 2016;76(9–10):556–65.
- 23. Nielsen MW, Stefanick ML, Peragine D, Neilands TB, Ioannidis JPA, Pilote L, et al. Gender-related variables for health research. Biol Sex Differ. 2021;12(1):23. pmid:33618769
- 24. Lacasse A, Pagé MG, Choinière M, Dorais M, Vissandjée B, Nguefack HLN, et al. Conducting gender-based analysis of existing databases when self-reported gender data are unavailable: the GENDER Index in a working population. Can J Public Health. 2020;111(2):155–68. pmid:31933236
- 25. Gill S, Stockard J, Johnson M, Williams S. Measuring gender differences: The expressive dimension and critique of androgyny scales. Sex Roles. 1987;17(7–8):375–400.
- 26. Horstmann S, Schmechel C, Palm K, Oertelt-Prigione S, Bolte G. The operationalisation of sex and gender in quantitative health-related research: a scoping review. Int J Environ Res Public Health. 2022;19(12):7493. pmid:35742742
- 27. Gerdes ZT, Levant RF. Complex relationships among masculine norms and health/well-being outcomes: correlation patterns of the conformity to masculine norms inventory subscales. Am J Mens Health. 2018;12(2):229–40. pmid:29219033
- 28. Mahalik JR, Morray EB, Coonerty-Femiano A, Ludlow LH, Slattery SM, Smiler A. Development of the Conformity to Feminine Norms Inventory. Sex Roles. 2005;52(7–8):417–35.
- 29. Tibubos AN, Otten D, Beutel ME, Brähler E. Validation of the Personal Attributes Questionnaire-8: Gender Expression and Mental Distress in the German Population in 2006 and 2018. Int J Public Health. 2022;67:1604510. pmid:35370535
- 30.
The Ontario HIV Treatment Network. Barriers to accessing health care among transgender individuals. The Ontario HIV Treatment Network; 2017. Available: https://www.ohtn.on.ca/rapid-response-barriers-to-accessing-health-care-among-transgender-individuals/
- 31. Claahsen-van der Grinten H, Verhaak C, Steensma T, Middelberg T, Roeffen J, Klink D. Gender incongruence and gender dysphoria in childhood and adolescence-current insights in diagnostics, management, and follow-up. Eur J Pediatr. 2021;180(5):1349–57. pmid:33337526
- 32. Novak JR, Peak T, Gast J, Arnell M. Associations Between Masculine Norms and Health-Care Utilization in Highly Religious, Heterosexual Men. Am J Mens Health. 2019;13(3):1557988319856739. pmid:31184245
- 33. Caddick N, Smith B, Phoenix C. Male combat veterans’ narratives of PTSD, masculinity, and health. Sociol Health Illn. 2015;37(1):97–111. pmid:25601067
- 34.
Velasco E, Dieleman E, Supakankunti S, Thi Mai Phoung T. Study on the gender aspects of the avian influenza crisis in southeast asia. Directorate General External Relations. 2008.
- 35. Grekou D, Yuquin L. Gender differences in employment one year into the COVID-19 pandemic: an analysis by industrial sector and firm size. Econ Soc Rep. 2021.
- 36. Banco D, Chang J, Talmor N, Wadhera P, Mukhopadhyay A, Lu X, et al. Sex and Race Differences in the Evaluation and Treatment of Young Adults Presenting to the Emergency Department With Chest Pain. J Am Heart Assoc. 2022;11(10):e024199. pmid:35506534
- 37. Samulowitz A, Gremyr I, Eriksson E, Hensing G. “Brave Men” and “Emotional Women”: A Theory-Guided Literature Review on Gender Bias in Health Care and Gendered Norms towards Patients with Chronic Pain. Pain Res Manag. 2018;2018:6358624. pmid:29682130
- 38. Putman A, Dogra S. Example analyses: using the SGBA-5 with simulated data. https://github.com/putman-a/SGBA-5_example_analysis. 2024.
- 39.
Brown B. Delphi process: a methodology used for the elicitation of opinions of experts. RAND Corporation. 1968. https://www.rand.org/pubs/papers/P3925.html
- 40. Beiderbeck D, Frevel N, von der Gracht HA, Schmidt SL, Schweitzer VM. Preparing, conducting, and analyzing Delphi surveys: Cross-disciplinary practices, new directions, and advancements. MethodsX. 2021;8:101401. pmid:34430297
- 41. Barrios M, Guilera G, Nuño L, Gómez-Benito J. Consensus in the delphi method: What makes a decision change?. Technological Forecasting and Social Change. 2021;163:120484.
- 42. Terwee CB, Bot SDM, de Boer MR, van der Windt DAWM, Knol DL, Dekker J, et al. Quality criteria were proposed for measurement properties of health status questionnaires. J Clin Epidemiol. 2007;60(1):34–42. pmid:17161752
- 43. Geerinck A, Alekna V, Beaudart C, Bautmans I, Cooper C, De Souza Orlandi F, et al. Standard error of measurement and smallest detectable change of the Sarcopenia Quality of Life (SarQoL) questionnaire: An analysis of subjects from 9 validation studies. PLoS One. 2019;14(4):e0216065. pmid:31034498
- 44. Weir JP. Quantifying test-retest reliability using the intraclass correlation coefficient and the SEM. J Strength Cond Res. 2005;19(1):231–40. pmid:15705040
- 45. Wilkinson L. Statistical methods in psychology journals. Am Psychol. 1999.
- 46.
R Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; 2023. Available: https://www.R-project.org/
- 47. Wickham H, Averick M, Bryan J, Chang W, McGowan L, François R, et al. Welcome to the Tidyverse. JOSS. 2019;4(43):1686.
- 48. Gamer M, Lemon J, Singh I. irr: various coefficients of interrater reliability and agreement. https://cran.r-project.org/web/packages/irr/index.html. 2019.
- 49. Yoshida K. Tableone. https://github.com/kaz-yos/tableone. 2023.
- 50. Kassambara A. Ggpubr: ggplot2 based publication ready plots. https://rpkgs.datanovia.com/ggpubr/. 2023.
- 51.
Fisher R. Statisical Methods for Research Workers. York University Classics in the History of Psychology. Original Publisher: Oliver and Boyd; 1925. Available: https://psychclassics.yorku.ca/Fisher/Methods/index.htm
- 52. Walter SD, Eliasziw M, Donner A. Sample size and optimal designs for reliability studies. Statist Med. 1998;17(1):101–10.
- 53. Dajani JS, Sincoff MZ, Talley WK. Stability and agreement criteria for the termination of Delphi studies. Technological Forecasting and Social Change. 1979;13(1):83–90.
- 54. von der Gracht HA. Consensus measurement in Delphi studies. Technological Forecasting and Social Change. 2012;79(8):1525–36.
- 55. de Villiers MR, de Villiers PJT, Kent AP. The Delphi technique in health sciences education research. Med Teach. 2005;27(7):639–43. pmid:16332558
- 56. Holey EA, Feeley JL, Dixon J, Whittaker VJ. An exploration of the use of simple statistics to measure consensus and stability in Delphi studies. BMC Med Res Methodol. 2007;7:52. pmid:18045508
- 57. Heise L, Greene ME, Opper N, Stavropoulou M, Harper C, Nascimento M, et al. Gender inequality and restrictive gender norms: framing the challenges to health. Lancet. 2019;393(10189):2440–54. pmid:31155275
- 58. Morais R, Bernardes S, Verdonk P. What is gender awareness in health? A scoping review of the concept, its operationalization, and its relation to health outcomes. Women Health. 2022;62(3):181–204. pmid:35220903
- 59. Colineaux H, Soulier A, Lepage B, Kelly-Irving M. Considering sex and gender in Epidemiology: a challenge beyond terminology. From conceptual analysis to methodological strategies. Biol Sex Differ. 2022;13(1):23. pmid:35550193
- 60. Stachenfeld NS, Mazure CM. Precision medicine requires understanding how both sex and gender influence health. Cell. 2022;185(10):1619–22. pmid:35561661