Objective scoring of application forms in obstetrics and gynaecology residency selection: A retrospective cohort study on the optimal number of committee members

Wim J. R. Rietdijk; Janneke K. Oostrom; Petra C. A. M. Bakker

doi:10.1371/journal.pone.0336478

Abstract

Introduction

The selection of residents for medical specialty programmes is a critical yet resource-intensive process. Although structured evaluation tools, such as standardized application forms, enhance objectivity and reliability, they often require all committee members to assess every candidate, resulting in inefficiencies. This study aimed to determine the optimal number of assessors needed to reliably score application forms of doctors to become resident in obstetrics & gynaecology without compromising selection outcomes.

Methods

This retrospective cohort study analysed data from three residency selection cycles (of the years 2022–2024) in the Northwest region of the Netherlands. Application forms were scored anonymously each year by 15–18 committee members, referred to as assessors, using a structured scoring system. Scores were analysed to identify the point at which adding more assessors no longer significantly impacted candidate rankings. Statistical measures included paired t-tests, correlations, and Cronbach’s alpha, and intraclass correlation coefficients to assess internal consistency and reliability.

Results

The analysis showed that six assessors are sufficient to reliably assess candidates. Correlations between average scores from six assessors and the grand average consistently exceeded 0.9 across all cohorts, and Cronbach’s alpha stabilized above 0.85. Significant differences in rankings were observed when increasing assessors from two to six but diminished beyond six. Bland-Altman plots confirmed agreement between scores from six assessors and the overall committee evaluation.

Conclusion

A structured evaluation process (i.e., using standardized application forms) requiring six assessors per candidate ensures reliable and consistent outcomes while reducing workload. Implementing this approach can enhance efficiency without compromising fairness or objectivity in selection for obstetrics & gynaecology residents. Future research should investigate the applicability of this model to other medical residency programmes internationally, and its impact on long-term performance.

Citation: Rietdijk WJR, Oostrom JK, Bakker PCAM (2025) Objective scoring of application forms in obstetrics and gynaecology residency selection: A retrospective cohort study on the optimal number of committee members. PLoS One 20(11): e0336478. https://doi.org/10.1371/journal.pone.0336478

Editor: Diana L. Gray, Washington University in St. Louis School of Medicine, UNITED STATES OF AMERICA

Received: August 12, 2025; Accepted: October 23, 2025; Published: November 19, 2025

Copyright: © 2025 Rietdijk et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The pseudonomyzed data and analysis code in R are available on reasonable request from the medical ethics committee from the Amsterdam UMC. Please contact the AMC Medical Ethical Committee for additional information: https://metc.amsterdamumc.org/contact/.

Funding: The author(s) received no specific funding for this work.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Resident selection is a critical step in ensuring the quality of postgraduate training and, ultimately, patient care. In many specialties, including Obstetrics & Gynaecology (OBGYN), selection procedures are organized regionally and typically involve a two-stage process, sometimes preceded by a national or central application round: an initial screening followed by interviews or assessment rounds [1]. Traditionally, the first selection is often based on applicants’ curriculum vitae (CV, or resume), motivational letters, and letters of recommendation. However, there is growing evidence that unstructured assessment of CVs, motivational letters, and letters of recommendation is prone to bias and may favour applicants with stronger social networks or institutional capital, rather than merit alone [1–6].

To promote fairness, transparency, diversity, and consistency in selection, structured application forms are increasingly being used as an alternative to traditional CV-based selection in medical education [1,2,7–12]. These forms guide assessors and applicants in a more systematic and standardized evaluation of relevant competencies and experiences [7], thereby reducing bias and increasing reliability [13]. However, a key challenge in applying structured forms is managing inter-assessor variability; differences in how individual assessors interpret and score the same application. While structured tools can improve consistency and reliability [7,14], to our knowledge there are only few studies on qualitatively assessing the inter-assessor variability within the evaluation of medical students in clerkships [15], and selection of medical residents in particular [16].

At our OBGYN department, applicants are reviewed by all members of the selection committee. The annual process of selecting new residents is labor-intensive, posing challenges to sustainability in the context of a shrinking healthcare workforce and increasing clinical workload [17]. Since 2022, the selection procedure has been standardized, but the resource-intensive nature of this procedure still required evaluation. The procedure has been standardized by means of using application forms and standardized evaluation form that help guide the evaluation of the assessors (see S1 Material). This study aims to evaluate the stability of first-round applicant rankings based on structured application forms in a regional OBGYN residency selection process. Specifically, we seek to determine the minimum number of assessors required to achieve consistent and reliable outcomes. The underlying rationale is to identify the point at which additional assessors no longer contribute meaningful changes to candidate rankings—commonly referred to as the ‘saturation point’. Establishing this threshold is essential for optimizing the trade-off between fairness and resource use, particularly in the context of increasing workload and workforce shortages in healthcare. The results will inform the efficient design of transparent and equitable selection procedures in OBGYN and potentially other medical specialties.

Methods

Study context

The selection process for residents to become gynaecologist in the Northwest region of the Netherlands, conducted from 2022 to 2024, was evaluated in this retrospective cohort study. This region consists of eight hospitals, including one academic hospital. The number of residency positions available each year was nine in 2022 and eight in both 2023 and 2024. The number of applicants was 29 in 2022, 14 in 2023, and 21 in 2024, with 15 candidates invited for an interview each year. The selection committee typically included 15–18 members from various hospitals within the region, including gynaecologists and a few residents in OBGYN. This study was approved by the medical ethics committee of Amsterdam UMC (approval number: 2023.0732). The data was extracted from the original forms and pseudonymized by the secretary of the department. The pseudonymized data were transferred to the authors for final statistical analysis on May 20^th, 2024. The data and analysis files are available from the corresponding author on reasonable request.

Selection process

Interested candidates submitted their CV and a motivation letter in response to the vacancy. The programme director and residency programme administrator reviewed these documents to ensure candidates met the eligibility criteria: possession of a medical degree, at least one year of clinical experience in obstetrics and gynaecology, and support from the department of OBGYN at the last hospital where the candidate worked. Candidates who met these criteria received a link to a digital application form, which collected general information, educational background, and clinical experience. The application form also inquired about the candidate’s experience in various domains, including societal impact, education and training, technical innovation, and research.

The digital application forms were anonymized and reviewed by the selection committee members, who scored the forms on a scale from 0 (no experience) to 2 (extensive experience) for each domain. Although a guideline (S1 Material) was provided to assist committee members in scoring the forms, they had some discretion in assigning scores. For example, in the research domain, a score of 0 was given for no publications, 1 for several publications, and 2 for completing a PhD. The highest-scoring candidates were then invited for structured interviews. Here, a committee member is referred to as an “assessor”.

Study outcome

This study focuses on the number of assessors and their rating of each candidate needed to reach the point at which adding more assessors no longer yields additional information. Therefore, the primary outcome measure was the stability of candidate selection, defined as the point at which adding additional assessors to the committee does not result in a significant change in the overall score an applicant received.

Statistical analysis

The data are analysed in several steps. We recorded the scores assigned by each assessor for each candidate. Each year, a total of 15–18 assessors evaluated the application forms of each candidate. Using these scores, we calculated an average score. Random pairs of assessor scores were included in the averages (i.e., 2, 4, 6, 8, 10, and the grand average across all assessors), and paired t-tests were used to determine statistically significant differences. We also analysed correlations between the grand assessor score and the averages from 2, 4, 6, 8, and 10 randomly chosen assessors. Additionally, Cronbach’s alpha was calculated to assess internal consistency, which is identical to a two-way random effects intraclass correlation coefficient (ICC) [18]. The point at which adding more assessors did not yield additional information, as indicated by the stabilization of averages and the plateauing of correlation and Cronbach’s alpha values. A correlation of 0.90 was used as the cutoff, and Cronbach’s alpha of 0.85 indicated high internal consistency. At this saturation point, the average scores showed agreement with the grand average assessor score. The analysis was adjusted based on methods from Rietdijk et al., Olvet and Hajcak, and Pontifex et al. [19–21], thresholds were reported in the studies mentioned before except for the ICC, which would be considered sufficient above 0.75 [18]. Data for the three cohorts (2022, 2023, and 2024) were analysed separately.

Results

Baseline characteristics of the committee members

Baseline characteristics of the committee members, such as years of experience, role, gender, and whether they worked in an academic or non-academic hospital are presented in Table 1. The majority of members were female and employed in a non-academic teaching hospital. Notably, only in the first year, in 2022, an educational specialist was included as part of the committee.

Download:

Table 1. Baseline characteristics of the assessors.

https://doi.org/10.1371/journal.pone.0336478.t001

Cohort 2022

Fig 1 presents the results for the selection process in 2022. The upper panel shows the average scores for 2, 4, 6, 8, and 10 assessors compared to the grand average. The middle panel shows the correlation between the evaluations by random assessors (2, 4, 6, 8, and 10) and the grand average. The lower panel presents the Cronbach’s alpha for the evaluations of each pair of assessors. Significant differences in average ratings were found when comparing 2 vs. 4 assessors (p = 0.049), 4 vs. 6 assessors (p = 0.013), and 6 vs. 8 assessors (p = 0.012), while differences were not significant when comparing 8 vs. 10 assessors (p = 0.958) and 10 vs. the grand average (p = 0.073). Significant positive correlations above 0.90 (p < 0.05) were found between the average scores for 2, 4, 6, 8, and 10 assessors and the grand average. Cronbach’s alpha increased with the number of assessors, exceeding 0.85 from 6 assessors onward, the ICCs supported this cohort 2022 result (S1 Table). Fig 2 shows the Bland-Altman plot of agreement (bias = −0.52, 95% limits of agreement −2.79 to 1.76) between the evaluation by 6 assessors and the grand average, indicating agreement between the two.

Download:

Fig 1. Presenting the analysis for the 2022 Cohort.

In the upper panel, we show the average rating for each candidate for an increasing number of assessors (i.e., average score of 2, 4, 6, 8, and all – randomly selected – assessors). In the middle panel, we show the correlation between the average score of the increasing number of assessors and the grand average (i.e., all assessors). In the bottom panel, we present the Cronbach’s Alpha for the increasing number of assessors. The results are for the cohort 2022.

https://doi.org/10.1371/journal.pone.0336478.g001

Download:

Fig 2. The Bland-Altman plot of agreement.

The Bland-Altman shows the agreement between the average score of the evaluations by six assessors and the average score of all assessors for cohort 2023.

https://doi.org/10.1371/journal.pone.0336478.g002

Cohort 2023

S1 Fig presents the results for the selection process in 2023. Significant differences in average ratings were found when comparing 2 vs. 4 assessors (p = 0.010) and 10 vs. the grand average (p = 0.009), while differences were not significant when comparing 4 vs. 6 assessors (p = 0.068), 6 vs. 8 assessors (p = 0.142), and 8 vs. 10 assessors (p = 0.827). Significant positive correlations above 0.880 (p < 0.05) were found between the average scores for 2, 4, 6, 8, and 10 assessors and the grand average. Cronbach’s alpha increased with the number of assessors, exceeding 0.85 from 4 assessors onward, the ICCs supported this cohort 2023 result (S1 Table). S2 Fig shows the Bland-Altman plot of agreement (bias = −0.49, 95% limits of agreement −1.57 to 0.60) between the evaluation by 6 assessors and the grand average, indicating agreement between the two.

Cohort 2024

S3 Fig presents the results for the selection process in 2024. Significant differences in average ratings were found when comparing 2 vs. 4 assessors (p < 0.001), 4 vs. 6 assessors (p = 0.016), and 6 vs. 8 assessors (p = 0.006), while differences were not significant when comparing 8 vs. 10 assessors (p = 0.568) and 10 vs. the grand average (p = 0.427). Significant positive correlations above 0.927 (p < 0.05) were found between the average scores for 2, 4, 6, 8, and 10 assessors and the grand average. Cronbach’s alpha increased with the number of assessors, exceeding 0.85 from 4 assessors onward, the ICCs supported this cohort 2024 result (S1 Table). S4 Fig shows the Bland-Altman plot of agreement (bias = 0.23, 95% limits of agreement −0.53 to 0.99) between the evaluation by 6 assessors and the grand average, indicating agreement between the two.

Discussion

Principal findings

This study aimed to determine the optimal number of assessors required to reliably score application forms for medical residency selection in OBGYN, ensuring that adding more assessors would not significantly alter selection outcomes. Our findings demonstrate that six assessors are sufficient to provide stable and reliable scores, as increasing the number beyond six did not result in meaningful changes in evaluation of job application candidates. In other words, each applicant’s form has to be evaluated by six (randomly assigned) assessors, rather than all assessors to make a well-informed decision to invite an applicant for the interview round. Thereby making the selection process less labour-intensive.

Relevance of the findings and comparison with existing literature

The study’s results provide a clear threshold for committee size in residency selection, demonstrating that six randomly chosen assessors out of a larger pool of assessors are sufficient for reliable evaluations. The correlation between their average scores and the overall committee (grand) average consistently exceeded 0.9, while internal consistency (Cronbach’s alpha) stabilized above 0.85. These metrics show that a small group of assessors can produce reliable scores, which is crucial for high-stakes decisions in medical education. Previous research has emphasized the importance of structured and objective selection tools to reduce bias and improve the reliability and validity of the selection process, and our findings align with this evidence [6,8]. Structured evaluation forms, like those used in this study, enhance objectivity and fairness by reducing subjectivity and increasing inter-assessor reliability. The high consistency observed in this study (Cronbach’s alpha > 0.85 with six assessors) further supports the effectiveness of structured methods for achieving reliable and fair outcomes in residency selection [21]. These findings offer practical guidance for reducing workload while maintaining high-quality standards in selection processes.

Strengths and limitations

A major strength of this study is its systematic approach to evaluating assessor performance across three distinct cohorts. By analysing correlations, Cronbach’s alphas, and Bland-Altman plots, we provide robust evidence that six assessors suffice to produce reliable ratings.

However, the study is limited by its single-centre design, with data derived from one residency programme in the Netherlands. This may restrict the generalizability of the findings to other specialties or regions. However, more specialty programmes had similar set up as compared to ours. This may suggest that with additional analysis of these data one may produce similar results as compared to the present study. In turn, this may also provide evidence for a more efficient and objective selection processes of residents. Furthermore, while the study focuses on optimizing efficiency, it does not evaluate whether the six-assessor model impacts the diversity of selected candidates or the validity for predicting (long-term) performance in the residency programme. In a meta-analysis, it was found that more objective selection strategies are the most useful in association with longer term performance [22]. Yet the impact of our standardized approach on longer term performance is missing [23]. Furthermore, the inclusion of six assessors does not say anything about the – perhaps – homogeneous composition of the selection committees. When the selection committee may be more diverse, other number of the committee members may occur from the statistical analysis.

Practical implications

Implementing a six-assessor, randomly selected model has immediate practical benefits for medical residency programmes. It reduces the workload for committee members, simplifies the logistical challenges of convening large committees, and maintains the reliability of the selection process. Assessor consistency is crucial for ensuring a fair selection process—demonstrating that candidates’ chances are not dependent on idiosyncratic biases of assessors. However, if assessors have similar backgrounds, they may still exhibit the same systematic biases. Such biases include, among other, the fact that selection committees tend to select candidates that are more similar to them. For example, male committee members are more likely to select men for jobs in general, and leadership roles in particular [24]. Hence, improving diversity may require additional systemic interventions. For example, selection committee members ought to be trained in how to address unconscious biases. Furthermore, the diversity of the selection committee itself needs to be prioritized, as diverse selection committees are more effective in fairly assessing candidates from varied backgrounds [5,25].

Systematic interventions must also address the recruitment of diverse candidate pools and on retaining diverse candidates throughout all career stages. Efforts such as mentorship programmes [5,26], pipeline initiatives [25,27], financial support [28], and inclusive admission policies [1] are essential to encourage underrepresented groups to pursue medical specialties. These strategies should complement the optimized selection process to ensure equitable opportunities and create a more representative physician workforce.

Future research

Future studies should assess whether the six-assessor model enhances equitable candidate selection and increase the diversity of selected candidates. Additionally, research should examine the long-term performance of candidates chosen through this model and its applicability to other selection processes for residency programmes in different countries. Furthermore, comparative studies could evaluate the effectiveness of this model against alternative selection tools, such as structured application forms and interviews or AI-assisted evaluations, particularly in fostering diversity.

Conclusion

A structured evaluation process (i.e., using standardized application forms) requiring six assessors per candidate ensures reliable and consistent outcomes while reducing workload. Implementing this approach can enhance efficiency without compromising fairness or objectivity in selection for obstetrics & gynaecology residents. Future research should investigate the applicability of this model to other medical residency programmes internationally, and its impact on long-term performance.

Supporting information

S1 Fig. Results of cohort 2023.

https://doi.org/10.1371/journal.pone.0336478.s001

(DOCX)

S2 Fig. Bland-Altman cohort 2023.

https://doi.org/10.1371/journal.pone.0336478.s002

(DOCX)

S3 Fig. Results of cohort 2024.

https://doi.org/10.1371/journal.pone.0336478.s003

(DOCX)

S4 Fig. Bland-Altman 2024.

https://doi.org/10.1371/journal.pone.0336478.s004

(DOCX)

S1 Material. Guide for assessors to rate job applicants, freely translated from the Dutch language.

https://doi.org/10.1371/journal.pone.0336478.s005

(DOCX)

S1 Table. Intraclass correlations across all cohorts.

https://doi.org/10.1371/journal.pone.0336478.s006

(DOCX)

References

1. Patterson F, Ferguson E. Selection for medical education and training. Understanding medical education: Evidence, theory and practice. 2010. pp. 352–65.
2. Towaij C, Gawad N, Alibhai K, Doan D, Raîche I. Trust Me, I Know Them: assessing interpersonal bias in surgery residency interviews. J Grad Med Educ. 2022;14(3):289–94. pmid:35754644
- View Article
- PubMed/NCBI
- Google Scholar
3. Gennissen LM, Stegers-Jager KM, de Graaf J, Fluit CRMG, de Hoog M. Unraveling the medical residency selection game. Adv Health Sci Educ Theory Pract. 2021;26(1):237–52. pmid:32870417
- View Article
- PubMed/NCBI
- Google Scholar
4. Meijer RR, Neumann M, Hemker BT, Niessen ASM. A tutorial on mechanical decision-making for personnel and educational selection. Front Psychol. 2020;10:3002. pmid:32038385
- View Article
- PubMed/NCBI
- Google Scholar
5. Arjani S, Tasnim S, Sumra H, Zope M, Riner AN, Reyna C, et al. It begins with the search committee: Promoting faculty diversity at the source. Am J Surg. 2022;223(2):432–5. pmid:34482952
- View Article
- PubMed/NCBI
- Google Scholar
6. Levashina J, Hartwell CJ, Morgeson FP, Campion MA. The structured employment interview: narrative and quantitative review of the research literature. Personnel Psychol. 2013;67(1):241–93.
- View Article
- Google Scholar
7. Patterson F, Knight A, Dowell J, Nicholson S, Cousans F, Cleland J. How effective are selection methods in medical education? A systematic review. Med Educ. 2016;50(1):36–60. pmid:26695465
- View Article
- PubMed/NCBI
- Google Scholar
8. Sackett PR, Zhang C, Berry CM, Lievens F. Revisiting meta-analytic estimates of validity in personnel selection: Addressing systematic overcorrection for restriction of range. J Appl Psychol. 2022;107(11):2040–68. pmid:34968080
- View Article
- PubMed/NCBI
- Google Scholar
9. Quintero AJ, Segal LS, King TS, Black KP. The personal interview: assessing the potential for personality similarity to bias the selection of orthopaedic residents. Acad Med. 2009;84(10):1364–72. pmid:19881423
- View Article
- PubMed/NCBI
- Google Scholar
10. Richardson M, Abraham C, Bond R. Psychological correlates of university students’ academic performance: a systematic review and meta-analysis. Psychol Bull. 2012;138(2):353–87. pmid:22352812
- View Article
- PubMed/NCBI
- Google Scholar
11. Fikrat-Wevers S. Selection and student diversity: The impact of tools, preparatory activities and applicant acceptance in health professions education. 2024.
12. Wolgast S, Bäckström M, Björklund F. Tools for fairness: increased structure in the selection process reduces discrimination. PLoS One. 2017;12(12):e0189512. pmid:29228052
- View Article
- PubMed/NCBI
- Google Scholar
13. Stayart CA, Brandt PD, Brown AM, Dahl T, Layton RL, Petrie KA, et al. Applying inter-rater reliability to improve consistency in classifying PhD career outcomes. F1000Res. 2020;9:8. pmid:32089837
- View Article
- PubMed/NCBI
- Google Scholar
14. Conway JM, Jako RA, Goodman DF. A meta-analysis of interrater and internal consistency reliability of selection interviews. J Appl Psychol. 1995;80(5):565–79.
- View Article
- Google Scholar
15. Zaidi NLB, Kreiter CD, Castaneda PR, Schiller JH, Yang J, Grum CM, et al. Generalizability of competency assessment scores across and within clerkships: how students, assessors, and clerkships matter. Acad Med. 2018;93(8):1212–7. pmid:29697428
- View Article
- PubMed/NCBI
- Google Scholar
16. Fung BSC, Gawad N, Rosenzveig A, Raîche I. The effect of assessor professional background on interview evaluation during residency selection: a mixed-methods study. Am J Surg. 2023;225(2):260–5. pmid:35637019
- View Article
- PubMed/NCBI
- Google Scholar
17. Rietdijk WJR, van der Kuy PHM, den Uil CA. Human resource management at the intensive care unit: a pragmatic review and future research agenda for building a learning health system. Learn Health Syst. 2023;8(2):e10395. pmid:38633021
- View Article
- PubMed/NCBI
- Google Scholar
18. Koo TK, Li MY. A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J Chiropr Med. 2016;15(2):155–63. pmid:27330520
- View Article
- PubMed/NCBI
- Google Scholar
19. Rietdijk WJR, Franken IHA, Thurik AR. Internal consistency of event-related potentials associated with cognitive control: N2/P3 and ERN/Pe. PLoS One. 2014;9(7):e102672. pmid:25033272
- View Article
- PubMed/NCBI
- Google Scholar
20. Olvet DM, Hajcak G. The stability of error-related brain activity with increasing trials. Psychophysiology. 2009;46(5):957–61. pmid:19558398
- View Article
- PubMed/NCBI
- Google Scholar
21. Pontifex MB, Scudder MR, Brown ML, O’Leary KC, Wu C-T, Themanson JR, et al. On the number of trials necessary for stabilization of error-related brain activity across the life span. Psychophysiology. 2010;47(4):767–73. pmid:20230502
- View Article
- PubMed/NCBI
- Google Scholar
22. Kenny S, McInnes M, Singh V. Associations between residency selection strategies and doctor performance: a meta-analysis. Med Educ. 2013;47(8):790–800. pmid:23837425
- View Article
- PubMed/NCBI
- Google Scholar
23. Bustraan J, Dijkhuizen K, Velthuis S, van der Post R, Driessen E, van Lith JMM, et al. Why do trainees leave hospital-based specialty training? A nationwide survey study investigating factors involved in attrition and subsequent career choices in the Netherlands. BMJ Open. 2019;9(6):e028631. pmid:31175199
- View Article
- PubMed/NCBI
- Google Scholar
24. Bosak J, Sczesny S. Gender bias in leader selection? Evidence from a hiring simulation study. Sex Roles. 2011;65(3–4):234–42.
- View Article
- Google Scholar
25. Kazmi MA, Spitzmueller C, Yu J, Madera JM, Tsao AS, Dawson JF, et al. Search committee diversity and applicant pool representation of women and underrepresented minorities: a quasi-experimental field study. J Appl Psychol. 2022;107(8):1414–27. pmid:34110855
- View Article
- PubMed/NCBI
- Google Scholar
26. Komaromy M, Grumbach K, Drake M, Vranizan K, Lurie N, Keane D, et al. The role of black and Hispanic physicians in providing health care for underserved populations. N Engl J Med. 1996;334(20):1305–10. pmid:8609949
- View Article
- PubMed/NCBI
- Google Scholar
27. Stanford FC. The importance of diversity and inclusion in the healthcare workforce. J Natl Med Assoc. 2020;112(3):247–9. pmid:32336480
- View Article
- PubMed/NCBI
- Google Scholar
28. Clayborne EP, Martin DR, Goett RR, Chandrasekaran EB, McGreevy J. Diversity pipelines: the rationale to recruit and support minority physicians. J Am Coll Emerg Physicians Open. 2021;2(1):e12343. pmid:33532751
- View Article
- PubMed/NCBI
- Google Scholar

[ref1] 1. Patterson F, Ferguson E. Selection for medical education and training. Understanding medical education: Evidence, theory and practice. 2010. pp. 352–65.

[ref2] 2. Towaij C, Gawad N, Alibhai K, Doan D, Raîche I. Trust Me, I Know Them: assessing interpersonal bias in surgery residency interviews. J Grad Med Educ. 2022;14(3):289–94. pmid:35754644
View Article
PubMed/NCBI
Google Scholar

[3] View Article

[4] PubMed/NCBI

[5] Google Scholar

[ref3] 3. Gennissen LM, Stegers-Jager KM, de Graaf J, Fluit CRMG, de Hoog M. Unraveling the medical residency selection game. Adv Health Sci Educ Theory Pract. 2021;26(1):237–52. pmid:32870417
View Article
PubMed/NCBI
Google Scholar

[7] View Article

[8] PubMed/NCBI

[9] Google Scholar

[ref4] 4. Meijer RR, Neumann M, Hemker BT, Niessen ASM. A tutorial on mechanical decision-making for personnel and educational selection. Front Psychol. 2020;10:3002. pmid:32038385
View Article
PubMed/NCBI
Google Scholar

[11] View Article

[12] PubMed/NCBI

[13] Google Scholar

[ref5] 5. Arjani S, Tasnim S, Sumra H, Zope M, Riner AN, Reyna C, et al. It begins with the search committee: Promoting faculty diversity at the source. Am J Surg. 2022;223(2):432–5. pmid:34482952
View Article
PubMed/NCBI
Google Scholar

[15] View Article

[16] PubMed/NCBI

[17] Google Scholar

[ref6] 6. Levashina J, Hartwell CJ, Morgeson FP, Campion MA. The structured employment interview: narrative and quantitative review of the research literature. Personnel Psychol. 2013;67(1):241–93.
View Article
Google Scholar

[19] View Article

[20] Google Scholar

[ref7] 7. Patterson F, Knight A, Dowell J, Nicholson S, Cousans F, Cleland J. How effective are selection methods in medical education? A systematic review. Med Educ. 2016;50(1):36–60. pmid:26695465
View Article
PubMed/NCBI
Google Scholar

[22] View Article

[23] PubMed/NCBI

[24] Google Scholar

[ref8] 8. Sackett PR, Zhang C, Berry CM, Lievens F. Revisiting meta-analytic estimates of validity in personnel selection: Addressing systematic overcorrection for restriction of range. J Appl Psychol. 2022;107(11):2040–68. pmid:34968080
View Article
PubMed/NCBI
Google Scholar

[26] View Article

[27] PubMed/NCBI

[28] Google Scholar

[ref9] 9. Quintero AJ, Segal LS, King TS, Black KP. The personal interview: assessing the potential for personality similarity to bias the selection of orthopaedic residents. Acad Med. 2009;84(10):1364–72. pmid:19881423
View Article
PubMed/NCBI
Google Scholar

[30] View Article

[31] PubMed/NCBI

[32] Google Scholar

[ref10] 10. Richardson M, Abraham C, Bond R. Psychological correlates of university students’ academic performance: a systematic review and meta-analysis. Psychol Bull. 2012;138(2):353–87. pmid:22352812
View Article
PubMed/NCBI
Google Scholar

[34] View Article

[35] PubMed/NCBI

[36] Google Scholar

[ref11] 11. Fikrat-Wevers S. Selection and student diversity: The impact of tools, preparatory activities and applicant acceptance in health professions education. 2024.

[ref12] 12. Wolgast S, Bäckström M, Björklund F. Tools for fairness: increased structure in the selection process reduces discrimination. PLoS One. 2017;12(12):e0189512. pmid:29228052
View Article
PubMed/NCBI
Google Scholar

[39] View Article

[40] PubMed/NCBI

[41] Google Scholar

[ref13] 13. Stayart CA, Brandt PD, Brown AM, Dahl T, Layton RL, Petrie KA, et al. Applying inter-rater reliability to improve consistency in classifying PhD career outcomes. F1000Res. 2020;9:8. pmid:32089837
View Article
PubMed/NCBI
Google Scholar

[43] View Article

[44] PubMed/NCBI

[45] Google Scholar

[ref14] 14. Conway JM, Jako RA, Goodman DF. A meta-analysis of interrater and internal consistency reliability of selection interviews. J Appl Psychol. 1995;80(5):565–79.
View Article
Google Scholar

[47] View Article

[48] Google Scholar

[ref15] 15. Zaidi NLB, Kreiter CD, Castaneda PR, Schiller JH, Yang J, Grum CM, et al. Generalizability of competency assessment scores across and within clerkships: how students, assessors, and clerkships matter. Acad Med. 2018;93(8):1212–7. pmid:29697428
View Article
PubMed/NCBI
Google Scholar

[50] View Article

[51] PubMed/NCBI

[52] Google Scholar

[ref16] 16. Fung BSC, Gawad N, Rosenzveig A, Raîche I. The effect of assessor professional background on interview evaluation during residency selection: a mixed-methods study. Am J Surg. 2023;225(2):260–5. pmid:35637019
View Article
PubMed/NCBI
Google Scholar

[54] View Article

[55] PubMed/NCBI

[56] Google Scholar

[ref17] 17. Rietdijk WJR, van der Kuy PHM, den Uil CA. Human resource management at the intensive care unit: a pragmatic review and future research agenda for building a learning health system. Learn Health Syst. 2023;8(2):e10395. pmid:38633021
View Article
PubMed/NCBI
Google Scholar

[58] View Article

[59] PubMed/NCBI

[60] Google Scholar

[ref18] 18. Koo TK, Li MY. A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J Chiropr Med. 2016;15(2):155–63. pmid:27330520
View Article
PubMed/NCBI
Google Scholar

[62] View Article

[63] PubMed/NCBI

[64] Google Scholar

[ref19] 19. Rietdijk WJR, Franken IHA, Thurik AR. Internal consistency of event-related potentials associated with cognitive control: N2/P3 and ERN/Pe. PLoS One. 2014;9(7):e102672. pmid:25033272
View Article
PubMed/NCBI
Google Scholar

[66] View Article

[67] PubMed/NCBI

[68] Google Scholar

[ref20] 20. Olvet DM, Hajcak G. The stability of error-related brain activity with increasing trials. Psychophysiology. 2009;46(5):957–61. pmid:19558398
View Article
PubMed/NCBI
Google Scholar

[70] View Article

[71] PubMed/NCBI

[72] Google Scholar

[ref21] 21. Pontifex MB, Scudder MR, Brown ML, O’Leary KC, Wu C-T, Themanson JR, et al. On the number of trials necessary for stabilization of error-related brain activity across the life span. Psychophysiology. 2010;47(4):767–73. pmid:20230502
View Article
PubMed/NCBI
Google Scholar

[74] View Article

[75] PubMed/NCBI

[76] Google Scholar

[ref22] 22. Kenny S, McInnes M, Singh V. Associations between residency selection strategies and doctor performance: a meta-analysis. Med Educ. 2013;47(8):790–800. pmid:23837425
View Article
PubMed/NCBI
Google Scholar

[78] View Article

[79] PubMed/NCBI

[80] Google Scholar

[ref23] 23. Bustraan J, Dijkhuizen K, Velthuis S, van der Post R, Driessen E, van Lith JMM, et al. Why do trainees leave hospital-based specialty training? A nationwide survey study investigating factors involved in attrition and subsequent career choices in the Netherlands. BMJ Open. 2019;9(6):e028631. pmid:31175199
View Article
PubMed/NCBI
Google Scholar

[82] View Article

[83] PubMed/NCBI

[84] Google Scholar

[ref24] 24. Bosak J, Sczesny S. Gender bias in leader selection? Evidence from a hiring simulation study. Sex Roles. 2011;65(3–4):234–42.
View Article
Google Scholar

[86] View Article

[87] Google Scholar

[ref25] 25. Kazmi MA, Spitzmueller C, Yu J, Madera JM, Tsao AS, Dawson JF, et al. Search committee diversity and applicant pool representation of women and underrepresented minorities: a quasi-experimental field study. J Appl Psychol. 2022;107(8):1414–27. pmid:34110855
View Article
PubMed/NCBI
Google Scholar

[89] View Article

[90] PubMed/NCBI

[91] Google Scholar

[ref26] 26. Komaromy M, Grumbach K, Drake M, Vranizan K, Lurie N, Keane D, et al. The role of black and Hispanic physicians in providing health care for underserved populations. N Engl J Med. 1996;334(20):1305–10. pmid:8609949
View Article
PubMed/NCBI
Google Scholar

[93] View Article

[94] PubMed/NCBI

[95] Google Scholar

[ref27] 27. Stanford FC. The importance of diversity and inclusion in the healthcare workforce. J Natl Med Assoc. 2020;112(3):247–9. pmid:32336480
View Article
PubMed/NCBI
Google Scholar

[97] View Article

[98] PubMed/NCBI

[99] Google Scholar

[ref28] 28. Clayborne EP, Martin DR, Goett RR, Chandrasekaran EB, McGreevy J. Diversity pipelines: the rationale to recruit and support minority physicians. J Am Coll Emerg Physicians Open. 2021;2(1):e12343. pmid:33532751
View Article
PubMed/NCBI
Google Scholar

[101] View Article

[102] PubMed/NCBI

[103] Google Scholar

Figures

Abstract

Introduction

Methods

Results

Conclusion

Introduction

Methods

Study context

Selection process

Study outcome

Statistical analysis

Results

Baseline characteristics of the committee members

Cohort 2022

Cohort 2023

Cohort 2024

Discussion

Principal findings

Relevance of the findings and comparison with existing literature

Strengths and limitations

Practical implications

Future research

Conclusion

Supporting information

S1 Fig. Results of cohort 2023.

S2 Fig. Bland-Altman cohort 2023.

S3 Fig. Results of cohort 2024.

S4 Fig. Bland-Altman 2024.

S1 Material. Guide for assessors to rate job applicants, freely translated from the Dutch language.

S1 Table. Intraclass correlations across all cohorts.

References