Peer Review History

Original SubmissionJuly 10, 2021
Decision Letter - Heather Mattie, Editor, Hamish S Fraser, Editor

PDIG-D-21-00034

Perpetuating Healthcare Disparities through Bias in Artificial Intelligence – a Global Review

PLOS Digital Health

Dear Dr. Mitchell,

Thank you for submitting your manuscript to PLOS Digital Health. After careful consideration, we feel that it has considerable merit but does not fully meet PLOS Digital Health’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please review the reviewer comments and add additional and clarifying text, and perhaps a slight change in the title.

Please submit your revised manuscript by Nov 26 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at digitalhealth@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pdig/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

We look forward to receiving your revised manuscript.

Kind regards,

Heather Mattie

Academic Editor

PLOS Digital Health

Journal Requirements:

1. Please amend your detailed Financial Disclosure statement. This is published with the article, therefore should be completed in full sentences and contain the exact wording you wish to be published.

i). Please include all sources of funding (financial or material support) for your study. List the grants (with grant number) or organizations (with url) that supported your study, including funding received from your institution. 

ii). State the initials, alongside each funding source, of each author to receive each grant.

iii). State what role the funders took in the study. If the funders had no role in your study, please state: “The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.”

iv). If any authors received a salary from any of your funders, please state which authors and which funders.

If you did not receive any funding for this study, please simply state: “The authors received no specific funding for this work.”

2. Please update the completed 'Competing Interests' statement, including any COIs declared by your co-authors. If you have no competing interests to declare, please state "The authors have declared that no competing interests exist". Otherwise please declare all competing interests beginning with the statement "I have read the journal's policy and the authors of this manuscript have the following competing interests:"

3. Please provide separate figure files in .tif or .eps format only and remove any figures embedded in your manuscript file.  Please ensure that all files are under our size limit of 20MB.  

For more information about how to convert your figure files please see our guidelines: https://journals.plos.org/digitalhealth/s/figures

Once you've converted your files to .tif or .eps, please also make sure that your figures meet our format requirements

4. Tables should not be uploaded as individual files. Please remove these files and include the tables in your manuscript file.

5. We have noticed that you have cited 'Supplementary Figure X' in the manuscript file. However, there is no corresponding file uploaded to the submission. Please ensure that all files are present to ensure that your paper is fully reviewed.

6. Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

Additional Editor Comments (if provided):

This is a timely piece and with the addition of text descriptions and clarifications, and perhaps a slight change in the title, will be a great contribution to the literature. In particular additional reflections on the potential gaps in the available methods for detecting bias in medical ML studies such as nationality of authors vs country of current institutional affiliation, and the accuracy of gender based on first names from multiple ethnicities. 

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Does this manuscript meet PLOS Digital Health’s publication criteria? Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe methodologically and ethically rigorous research with conclusions that are appropriately drawn based on the data presented.

Reviewer #1: Yes

Reviewer #2: Partly

Reviewer #3: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: N/A

Reviewer #3: N/A

**********

3. Have the authors made all data underlying the findings in their manuscript fully available (please refer to the Data Availability Statement at the start of the manuscript PDF file)?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception. The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: No

Reviewer #3: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS Digital Health does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: PERPETUATING HEALTHCARE DISPARITIES THROUGH BIAS IN ARTIFICIAL

Review

The study authors posit that the disparities in data sources of clinical AI studies done in 2019 and referenced in PubMed—namely, country or countries where the study was done, specialty of researchers, nationality of researchers, sex and expertise of researchers-- lead to bias, which in turn perpetuates “healthcare disparities” in general and specifically disparity in applicability of study findings toward healthcare of the study population (data source of study), vs the healthcare of the general disease population.

The methodology of a typical AI clinical study looks like this:

Clinical question regarding a disease is generated based on a real-life situation-->hypothesis (H1) is generated �data is gathered by researcher from a study population with said disease--> the data is split into 70:20:10. 70% of the data is used for model development, 20% for model validation, and 10% for model testing (augmentation sometimes done to balance the data)�training of model is done until performance is above 90%-->validation dataset is used to refine the model�more and more data is gathered for training and testing until testing performance is higher than 80%-->The algorithm is tested on a bigger or different population but with the same disease-->the clinical question is considered answered once ROC is .80 or higher in the general disease population—>new knowledge is added to the current pool of knowledge regarding the disease in question.

The sources of bias in AI research methodology that were highlighted in the body of this paper (but not clearly stated in the title) are found in the beginning of any AI clinical study. This is the part of the methodology where humans are involved—in the conception of the AI clinical study design and in the generation of the hypothesis (H1) to be tested. Another source of bias is the creation of the AI model that fails to take into account inherent differences in disease characteristics in human populations from which training, internal validation and testing data is gathered vs. the disease characteristics in human populations where the study findings will be applied.

What may be a more suitable title, based on the methodology and findings of this study, is SOURCES OF BIAS IN ARTIFICIAL INTELLIGENCE CLINICAL STUDIES THAT PERPETUATE DISPARITIES IN HEALTHCARE. This study’s authors could then go on to identify 2 sources of bias in AI-powered studies: 1) the researchers who ask the clinical question and gather and analyze relevant data (authors’ clinical specialty, nationality, sex, expertise may introduce bias), and 2) the population of interest from whom the clinical data is gathered, which may have different disease characteristics from the general disease population upon whom the study findings will be applied (country source). A more thorough review of literature on the existence of disparities in global healthcare would serve as a relevant background to this study.

Regarding limitations of the study. This study uses AI to study the biases inherent in AI clinical studies. This begs the question: do the same biases apply to non-clinical AI studies such as this? Are the researchers of this non-clinical AI study biased because their study was conducted in the United States, because of the nationality (needs to be better defined) of the first and last authors, because of the gender of the first and last authors and because of their expertise? An introspective answer to these questions may further help in proving the point of this study, that all humans are biased, and that these biases affect the direction of healthcare research and application in the real world.

Reviewer #2: Thank you for the opportunity to review your paper on this important topic. The work behind this is considerable.

However, I think it needs a re-think in terms of context. The paper presented extols a single view, and I think needs a more balanced view - in effect the paper risks losing impact because it reads as biased in itself.

Much of the current general research literature is arguably biased or potentially biased - including populations, settings, authors, and institutions. Therefore AI based research needs to be viewed through this lens. It is because of this bias that one of the key tenants of Evidence Based Medicine is the question ‘does this apply to my patient?’. AI research should be viewed through the same lens. By referring back to EBM, it can put AI based research in to a similar context to other research, making it potentially more digestible and more mainstream.

The choice of current institution as representing the world view of the authors is potentially misleading. Looking at the authors of this paper there is broad cultural diversity, not represented in the list of affiliations. Therefore this tool is creating its own bias. This needs to be recognised and discussed. The same may apply to the name API.

There are some developing tools to detect bias in datasets – something that is not represented in any previous non-AI research. Although these are still not often used, in a paper on bias I think their existence should be recognized, and use encouraged.

The growing availability of datasets also needs a balanced view. Noting that there are real and potential biases in these, it is also important to recognize these have already had a valuable contribution to the field, and their ongoing development and use should be encouraged rather than discouraged, noting the limitations. Their availability becomes an attractant to time investment, which potentially improves the opportunity to democratize Healthcare.

The ability to transfer learn and thus customize to local populations needs more attention. This technique allows the power of large datasets to help inform smaller datasets, meaning it is simpler to get a return on investment in lower resource environments, thereby encouraging development in this area. It also leverages the investment in resource rich environments. Also of note is that for most non-AI interventions this opportunity rarely if ever exists – either to repeat the intervention, or tailor the intervention to local population and parameters.

The topic is important. I think a more balanced view, including highlighting the opportunities along with potential fixes as well as the risks, is important to get this message across.

Reviewer #3: It is a very important and interesting manuscipt about global disparities through bias in AI. The authors addressed the importance of AI in medical care and through a rigorous and extensive review and application of transfer-learning techniques. They found factors that could account for disparities in building of AI and its application in medical care.

The authors mention in the abstract that databases came mainly from the USA and China. Looking at the results it is noticeable that database, at least the first 10 places, were from high income countries. It is important to introduce a phrase in abstract indicating such observation, because the authors make emphasis, in the interpretation (abstract) that it is important to develop infrastructure for AI in data-poor regions.

In page 16 the authors speak about the over-representation of radiology, the same argument “access to image-data” applies for pathology, which was the other over-represented specialty.

The authors do not mention as a limitation on gender analysis, that the same names are used in some countries for women and men.

Even though AI is very important in Medicine, physicians do not (at least not in many countries) acquire the basic knowledge of AI when they are medical students. Same happens with specialists. Since the manuscript focuses on disparities, it would be convenient to mention the handicap that medical students have in AI knowledge. That explains why a high proportion of the main AI authors are not clinicians.

Even though I accepted the manuscript I would like the authors to introduce in the abstract that the first ten databases came from high income countries.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

Do you want your identity to be public for this peer review? If you choose “no”, your identity will remain anonymous but your review may still be made public.

For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: BEATRICE TIANGCO

Reviewer #2: No

Reviewer #3: Yes: Cleva Villanueva

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

Revision 1

Attachments
Attachment
Submitted filename: 10nov21_ML_reply.docx
Decision Letter - Hamish S Fraser, Editor

Sources of Bias in Artificial Intelligence that Perpetuate Healthcare Disparities - a Global Review

PDIG-D-21-00034R1

Dear Dr Mitchell,

We are pleased to inform you that your manuscript 'Sources of Bias in Artificial Intelligence that Perpetuate Healthcare Disparities - a Global Review' has been provisionally accepted for publication in PLOS Digital Health.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests.

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they'll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact digitalhealth@plos.org.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Digital Health.

Best regards,

Hamish S Fraser, MBCHB MSc

Section Editor

PLOS Digital Health

***********************************************************

Thank you for addressing the reviewer comments. We are happy to accept the manuscript.

Reviewer Comments (if any, and for reference):

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #3: All comments have been addressed

**********

2. Does this manuscript meet PLOS Digital Health’s publication criteria? Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe methodologically and ethically rigorous research with conclusions that are appropriately drawn based on the data presented.

Reviewer #1: Yes

Reviewer #3: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: N/A

Reviewer #3: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available (please refer to the Data Availability Statement at the start of the manuscript PDF file)?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception. The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #3: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS Digital Health does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #3: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: No further comments

Reviewer #3: This reviewer has carefully read and analyzed the second submission of the manuscript 00034R to PLOS Digital Health. The manuscript addresses an important issue in modern health care, Artificial Intelligence and bias that affect equality in healthcare.

The authors have properly answered the questions of the reviewers and have changed the manuscript accordingly. The manuscript is understandable, shows important data and points out subjects that should change in the field of application of AI on medicine and healthcare. This reviewer only noticed one word that would be convenient to change. The word “sex” could be changed by “gender”. The opinion of this reviewer is that the manuscript meets all the criteria to be published in PLOS Digital Health.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

Do you want your identity to be public for this peer review? If you choose “no”, your identity will remain anonymous but your review may still be made public.

For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Beatrice J. Tiangco

Reviewer #3: Yes: Cleva Villanueva

**********

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .