Implicit bias in safety-aligned large language models: A multi-faceted evaluation of clinical decision-making and health equity

Qiufeng Jia; Yuhang Wen; Yuyan Liu; Hui Zhao; Qiongge Yu; Yu Long; Dan Sun; Yufeng Yu

doi:10.1371/journal.pone.0348819

Peer Review History

Original SubmissionJanuary 28, 2026
13 Mar 2026 Decision Letter - Thiago P. Fernandes, Editor PONE-D-26-04379Implicit bias in safety-aligned large language models: A multi-faceted evaluation of clinical decision-making and health equityPLOS One Dear Dr. Yu, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Apr 27 2026 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: A letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Thiago P. Fernandes, PhD Academic Editor PLOS One Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. Please note that PLOS One has specific guidelines on code sharing for submissions in which author-generated code underpins the findings in the manuscript. In these cases, we expect all author-generated code to be made available without restrictions upon publication of the work. Please review our guidelines at https://journals.plos.org/plosone/s/materials-and-software-sharing#loc-sharing-code and ensure that your code is shared in a way that follows best practice and facilitates reproducibility and reuse. 3. Thank you for stating the following in the Acknowledgments Section of your manuscript: “This work was supported by the Chengdu University of Traditional Chinese Medicine in 2025.” We note that you have provided funding information that is not currently declared in your Funding Statement. However, funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form. Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows: “The author(s) received no specific funding for this work.” Please include your amended statements within your cover letter; we will change the online submission form on your behalf. 4. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information. 5. We note that there is identifying data in the Supporting Information file < MIAT_Code_and_Data.zip >. Due to the inclusion of these potentially identifying data, we have removed this file from your file inventory. Prior to sharing human research participant data, authors should consult with an ethics committee to ensure data are shared in accordance with participant consent and all applicable local laws. Data sharing should never compromise participant privacy. It is therefore not appropriate to publicly share personally identifiable data on human research participants. The following are examples of data that should not be shared: -Name, initials, physical address -Ages more specific than whole numbers -Internet protocol (IP) address -Specific dates (birth dates, death dates, examination dates, etc.) -Contact information such as phone number or email address -Location data -ID numbers that seem specific (long numbers, include initials, titled “Hospital ID”) rather than random (small numbers in numerical order) Data that are not directly identifying may also be inappropriate to share, as in combination they can become identifying. For example, data collected from a small group of participants, vulnerable populations, or private groups should not be shared if they involve indirect identifiers (such as sex, ethnicity, location, etc.) that may risk the identification of study participants. Additional guidance on preparing raw data for publication can be found in our Data Policy (https://journals.plos.org/plosone/s/data-availability#loc-human-research-participant-data-and-other-sensitive-data) and in the following article: http://www.bmj.com/content/340/bmj.c181.long. Please remove or anonymize all personal information, ensure that the data shared are in accordance with participant consent, and re-upload a fully anonymized data set. Please note that spreadsheet columns with personal information must be removed and not hidden as all hidden columns will appear in the published file. 6. If the reviewer comments include a recommendation to cite specific previously published works, please review and evaluate these publications to determine whether they are relevant and should be cited. There is no requirement to cite these works unless the editor has indicated otherwise. Additional Editor Comments: Please respond to all comments and highlight the changes in the revised manuscript. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes ******** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes ****** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes ****** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes ****** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: 1. The application of GAI in medicine and nursing is a serious issue. 2. Currently, there is extreme caution globally regarding the use of GAI in medicine due to medicine has a very low tolerance for error, even allowing for minor mistakes. 3. The paper stated that professionals must adopt an "AI-aware" stance. They should critically evaluate the algorithm's output, treating it as a fallible "second opinion," not objective truth, ensuring that human judgment remains the ultimate guarantee of fair patient care. This is a very good foundation. Without this layer of safety, even the slightest mishap could cause humanity to lose faith in AI-assisted healthcare. 4. The paper's prioritization of GAI algorithms for debiasing has made a significant contribution to medicine. 5. It's necessary to consider the possibility of major mistakes arising from accumulated facts inferring, while simultaneously relying on artificial intelligence at various levels to remove trivial details in routine tasks. 6. This paper primarily discusses GAI bias in word usage across different models. Besides AI vigilance, a process of decision attribution should be established within the system so that when GAI commits a systemic error, humans can quickly eliminate a series of systemic errors. Reviewer #2: This manuscript employs technical methods to assess the potential implicit biases of various mainstream LLMs in the medical field and their impact on clinical decision-making and health equity. It ingeniously adapts the traditional IAT into test frameworks suitable for large models, demonstrating high innovation. Methodologically, when the author verified whether implicit associations predict discriminatory decisions, they only conducted a paired-prompt analysis on the DeepSeek-Chat model. This weakens the generalizability of the conclusion. It is suggested to supplement the comparison results of at least one closed-source advanced model , such as GPT-4 et al to demonstrate that the finding that "implicit associations predict discriminatory decisions" is not confined to a single model architecture. I couldn't find supplementary materials to verify the test data of this research. It is suggested that specific examples of the construction of the Prompt in both Chinese and English be provided in the main text to demonstrate what kind of medical scenarios the "situational decision-making" that the model faces in the Relative Decision Test is. This would help clinical physician readers understand the mechanism of AI bias more intuitively. This study did not employ external validation data and did not verify the issue of lexical association bias in the real world. Although the authors raised this issue in the limitations, it is obvious that the same problem was also raised in reference 5. Therefore, as a new study, efforts should be made to address this issue rather than allowing such an obvious defect to persist. The statement mentioned that the data is in the Supplementary. It is necessary to ensure that the code repository containing all the prompt templates and API call parameters, such as the GitHub repository link is fully uploaded before submission to guarantee the reproducibility of the research. Reviewer #3: 1. The LLMs examined in this study were trained on explicit, digitally accessible data, including text and images. However, implicit bias is inherently difficult to measure even within traditional psychological research, and this challenge is further compounded in the context of LLMs. These models undergo alignment training that systematically orients their outputs toward socially desirable responses, making it methodologically difficult to bypass this framework. This constitutes an inherent limitation of the current study's design. 2. Traditional implicit bias measures, such as the IAT, rely on response latency as a core indicator, premised on the assumption that conceptually proximate pairings elicit shorter reaction times. The author is suggested to provide supplementary analysis on whether measurable differences in computational response time were observed across models or the underlying distances between the concepts in statistical spaces. If no significant differences were found, the authors should address possible explanations. For instance, that model outputs are fundamentally a function of the current prompt and context, and that computational latency reflects architectural properties rather than cognitive distance. The validity of response time as an index of bias in this context therefore warrants critical discussion. 3. The experimental design includes relatively few scenarios grounded in medical or healthcare-specific contexts. The overall framework resembles a series of forced-choice tasks rather than simulations of authentic clinical decision-making, which may limit the ecological validity of the findings. 4. It is unclear whether all prompts and tasks in this study were administered within the same agent session (i.e., a single conversation thread). If so, the authors should address the potential for context reinforcement across turns, whereby earlier responses may systematically influence subsequent outputs. Additionally, it is worth considering whether the models themselves were pre-trained on literature related to implicit bias measurement. If so, the models may have been capable of inferring the intent of the assessment, potentially confounding the results. 5. In the Relative Decision Task (p.6), the Decision Bias Score is operationalized on a scale of 0 to 1, a design that obscures the directionality of bias. Furthermore, treating bias as a binary construct oversimplifies the phenomenon and fails to account for intersectionality. For example, the compounded effects of race and gender may produce bias patterns that are invisible when each dimension is examined in isolation. 6. In the Paired Prompt Analysis, the bias priming of certain prompts is relatively transparent, which risks activating the models' safety filters. In such cases, the outputs may reflect the operation of safety mechanisms rather than the models' underlying bias tendencies, potentially producing false negatives in bias detection. Furthermore, all experimental materials were administered in English, despite the inclusion of culturally sensitive content involving race and comparisons between Western and traditional Chinese medicine. Given that the models tested were developed by companies operating across different linguistic and cultural contexts, it remains an open question whether language and cultural framing systematically influence the nature and direction of the biases elicited. 7. This study makes a novel contribution by translating implicit bias measurement concepts from human psychology to the evaluation of LLMs. First, in classical psychological experiments, the instruction set and testing environment are critical elements for ensuring measurement validity; it should be clarified whether these elements were systematically incorporated into the prompt design. Second, the inclusion of responses from healthcare professionals as a reference benchmark would allow for a meaningful comparison between model outputs and human clinical judgment, and would more directly speak to the stated focus of the paper. 8. Gender and racial implicit bias emerge as central findings in the category-level analysis. However, it is important to consider whether these observed differences may partly reflect established clinical or epidemiological disparities already documented within the medical literature. If so, some portion of the model's outputs may be reproducing statistical patterns present in the training data rather than exhibiting bias in a normative sense. Distinguishing between outputs that mirror real-world variation and those that represent bias warranting correction is a fundamental interpretive challenge that the authors should address explicitly. ****** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: Yes: Zih-Ping Ho Reviewer #2: Yes: Hanqing Zhao Reviewer #3: Yes: Yi-Ju Lee ******** [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] To ensure your figures meet our technical requirements, please review our figure guidelines: https://journals.plos.org/plosone/s/figures You may also use PLOS’s free figure tool, NAAS, to help you prepare publication quality figures: https://journals.plos.org/plosone/s/figures#loc-tools-for-figure-preparation. NAAS will assess whether your figures meet our technical requirements by comparing each figure against our figure specifications. https://doi.org/10.1371/journal.pone.0348819.r001
Revision 1
1 Apr 2026 Author Response A detailed point-by-point response to all editor and reviewer comments has been provided in the uploaded file “Response to Reviewers”. Attachments Attachment Submitted filename: Response_to_Reviewers.pdf https://doi.org/10.1371/journal.pone.0348819.r002
15 Apr 2026 Decision Letter - Thiago P. Fernandes, Editor PONE-D-26-04379R1Implicit bias in safety-aligned large language models: A multi-faceted evaluation of clinical decision-making and health equityPLOS One Dear Dr. Yu, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. ==============================Please respond to all comments and highlight them in the revised ms. ============================== Please submit your revised manuscript by May 30 2026 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: A letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. As the corresponding author, your ORCID iD is verified in the submission system and will appear in the published article. PLOS supports the use of ORCID, and we encourage all coauthors to register for an ORCID iD and use it as well. Please encourage your coauthors to verify their ORCID iD within the submission system before final acceptance, as unverified ORCID iDs will not appear in the published article. Only the individual author can complete the verification step; PLOS staff cannot verify ORCID iDs on behalf of authors. We look forward to receiving your revised manuscript. Kind regards, Thiago P. Fernandes, PhD Academic Editor PLOS One Journal Requirements: 1. If the reviewer comments include a recommendation to cite specific previously published works, please review and evaluate these publications to determine whether they are relevant and should be cited. There is no requirement to cite these works unless the editor has indicated otherwise. 2. Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice. [Note: HTML markup is below. Please do not edit.] Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #2: All comments have been addressed Reviewer #3: (No Response) ******** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #2: Yes Reviewer #3: (No Response) ****** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #2: Yes Reviewer #3: (No Response) ****** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #2: Yes Reviewer #3: (No Response) ****** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #2: Yes Reviewer #3: (No Response) ****** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #2: The author has made active and systematic revisions in response to the previous round of review comments, supplemented some experiments, and the main issues of the paper have been substantially addressed. However, there are still some areas that need improvement. 1. The author reports that the Debiasing Prompt is significantly effective on GPT-4o (P=0.026), but shows no statistical significance on DeepSeek-V3 (P=0.089), without discussing the possible reasons for this difference. 2. It is suggested to clarify: (1) the complete version number of the model; (2) the time interval of API calls; (3) the settings of key sampling parameters such as temperature. Reviewer #3: (No Response) ****** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #2: Yes: Hanqing Zhao Reviewer #3: Yes: Yi-Ju Lee ******** [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] To ensure your figures meet our technical requirements, please review our figure guidelines: https://journals.plos.org/plosone/s/figures You may also use PLOS’s free figure tool, NAAS, to help you prepare publication quality figures: https://journals.plos.org/plosone/s/figures#loc-tools-for-figure-preparation. NAAS will assess whether your figures meet our technical requirements by comparing each figure against our figure specifications. https://doi.org/10.1371/journal.pone.0348819.r003
Revision 2
20 Apr 2026 Author Response We have provided a point-by-point response to the reviewer comments and revised the manuscript accordingly. All revised files have been uploaded. Attachments Attachment Submitted filename: Response_to_Reviewers_auresp_2.pdf https://doi.org/10.1371/journal.pone.0348819.r004
22 Apr 2026 Decision Letter - Thiago P. Fernandes, Editor Implicit bias in safety-aligned large language models: A multi-faceted evaluation of clinical decision-making and health equity PONE-D-26-04379R2 Dear Dr. Yu, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager® and clicking the ‘Update My Information' link at the top of the page. For questions related to billing, please contact billing support. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Thiago P. Fernandes, PhD Academic Editor PLOS One Additional Editor Comments (optional): Reviewers' comments: https://doi.org/10.1371/journal.pone.0348819.r005
Formally Accepted
Acceptance Letter - Thiago P. Fernandes, Editor PONE-D-26-04379R2 PLOS One Dear Dr. Yu, I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS One. Congratulations! Your manuscript is now being handed over to our production team. At this stage, our production department will prepare your paper for publication. This includes ensuring the following: * All references, tables, and figures are properly cited * All relevant supporting information is included in the manuscript submission, * There are no issues that prevent the paper from being properly typeset You will receive further instructions from the production team, including instructions on how to review your proof when it is ready. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few days to review your paper and let you know the next and final steps. Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. You will receive an invoice from PLOS for your publication fee after your manuscript has reached the completed accept phase. If you receive an email requesting payment before acceptance or for any other service, this may be a phishing scheme. Learn how to identify phishing emails and protect your accounts at https://explore.plos.org/phishing. If we can help with anything else, please email us at customercare@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Thiago P. Fernandes Academic Editor PLOS One https://doi.org/10.1371/journal.pone.0348819.r006

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .