PyHIST: A Histological Image Segmentation Tool

Manuel Muñoz-Aguirre; Vasilis F. Ntasis; Santiago Rojas; Roderic Guigó

doi:10.1371/journal.pcbi.1008349

Peer Review History

Original SubmissionMay 20, 2020
17 Jul 2020 Decision Letter - Dina Schneidman-Duhovny, Editor Dear Muñoz-Aguirre, Thank you very much for submitting your manuscript "PyHIST: A Histological Image Segmentation Tool" for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. In light of the reviews (below this email), we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments. We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation. When you are ready to resubmit, please upload the following: [1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. [2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file). Important additional instructions are given below your reviewer comments. Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts. Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments. Sincerely, Dina Schneidman-Duhovny Software Editor PLOS Computational Biology Dina Schneidman-Duhovny Software Editor PLOS Computational Biology ********************* Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: The paper introduces a simple and easy to use open-source tool for slide tiling, which is useful for histopathology image analysis. I have to acknowledge there are currently inadequate tools available that make tiling of digital slides simple, and I had to write my own scripts to tile my slides when analyzing whole slide images. Overall, my experience with the application was positive; I was able to create tiles from svs and TIFF files in a short amount of time, with a short learning curve. The setup and installation was relatively easy; the Docker version of the software worked without problems on Ubuntu and Windows 10 based machines. A few problems were encountered when the program was installed through Anaconda using the program’s accompanying installation instructions: in the Ubuntu machine, ‘cv2’ was reported to be missing; in Windows 10, several libraries were missing. I was able to get the program running in Ubuntu and Anaconda after manually installing OpenCV. I did not attempt to manually fix the Anaconda installation in Windows due to the large number of missing libraries. I am not sure if this is due to the unique setup of my computer, or the program was not tested with Windows 10 and Anaconda. Suggested Correction: Lines 22 - 23 states “Histopathological images are routinely used in the diagnosis of many diseases, notably cancer.” This can be misinterpreted as saying that pathologists make their diagnoses predominantly through whole slide images (WSIs). Although WSI is becoming more widespread in pathology departments, most pathologists still render their diagnoses by examining glass slides under a microscope. This statement has to be corrected/modified to reflect that whole slide images are still not being used by majority of pathologists to sign out their cases, although there is an increasing adoption of whole slide scanning technologies in pathology departments. Future direction: There is more potential in this software, which can accommodate additional features in the future while retaining its simplicity. Aside from adding new features, I believe adding a Graphical user interface (GUI) version of the program would increase the application’s user base, and be helpful for those who are less computer savvy and have no experience in using the command line. Reviewer #2: The manuscript submitted by Muñoz-Aguirre and colleagues aims to describe the development of PyHIST which is a histological image segmentation tool. Overall, this manuscript presents results that would be of interest to the community of scientists and computational biologists concerned with this problem. However, there are major issues in this manuscript that prevent us from recommending that this manuscript be accepted in its current state. Major: 1) Abstract: highlights that preprocessing enabled by PyHIST involves image scaling, segmentation, and eventually tile extraction to clearly mention the utility of PyHIST. 2) Introduction: The paper correctly addresses the need for standardization of the tiling and patch-creating pipeline for researchers working in this area to prevent dataset-specific biases. Although, as far as saving research time is concerned, currently, WSI preprocessing requires developing custom scripts, but once a process is established researchers can typically use similar code for subsequent tiling for all projects. Therefore, PyHIST may only save a significant amount of time at the initial phase. 3) Facts have been mentioned without references – we have mentioned a few examples but urge the authors to add extensive references: - lines 22 (citation for WSI obtaining process required), - 23 (citation for use in cancer), - 25 (citation to support the claim of development of computational methods for disease diagnosis and classification), and - 33 (cite literature to support histopathological images capturing endophenotypes that provide crucial information when correlated with molecular and cellular data). - In a similar way, kindly provide references at lines 37, 46, 50 4) Design and Implementation: - It’s not clear why the authors are interested in highlighting edges within tissue fragments rather than outlining the entire fragment. Figure 1b resembles a grayscaled WSI. A similar result as Figure 1b can be reached with less computation by just binarizing the WSI using a threshold to separate background from foreground. Does edge detection provide any unique benefits over binarizing the WSI? - The graph-based segmentation algorithm can perform unsupervised segmentation on complex images, but in this case the algorithm just needs to detect the connected objects. If the input image is a binary mask (foreground and background), there are many simple functions to label contiguous/interconnected objects and produce an output similar to Figure 1c. Is graph-based segmentation used because it works well with edge detection inputs? How does it compare computationally to other connected-component labeling techniques such as Python Skimage’s measure.label function? - Why are steps (b) and (c) needed in the PyHIST pipeline in Figure 1? Red gridlines still appear to tile the entire WSI and then some tiles are not stored based on a background threshold. How are the tissue fragment labels from (c) used? 5) Results: - Details of the deep learning model have not been provided – patches detected correctly have vague histology that is shown in Figure 2 A (explained below). We suggest a pathologist review of the deep learning model results. Additionally, the connection between a better model accuracy on the dataset and the validity of the pre-processing steps has not been made. - The partitioning of training and test sets can be the most time-consuming pre- processing steps of the ML process. Tiles from the same WSI should be constrained to the training or test sets. It is difficult to satisfy this constraint, while also managing the percentage of tiles in the test set and class imbalances. This process is not a built-in feature of PyHIST and it is unclear in the paper if PyHIST assists with this aspect of the ML pipeline at all. - The deep learning results are an example that tiles processed using PyHIST can achieve high prediction performance, but it doesn’t necessarily prove that it is better than other baseline or competing approaches. WSIs from different part of the body can be quite distinguishable, so many different tiling approaches could produce similar results. The Results section could include comparisons of performance and computation time for several tiling methodologies. How does PyHIST stack up against other techniques? 6) Availability and future directions: - The SVS limitation is mentioned here but should also be addressed earlier in the Intro or Design sections. For example, “PyHIST is currently limited to only SVS format due to/because…”. 7) Figures: - Figure 2 A: histology is ambiguous since the top panel for ‘T-brain’ shows artefactual tissue rather than brain tissue with cell bodies of neurons or glia etc. This is repeated for 3rd, 4th and 5th (from left) T-breast, and 1st (from left) T-colon. 8) Supplementary Materials: - Section S3: cropping the image tiles is mentioned – what is the size of these crops and are these kept uniform each time? Explanation is required for clarity of the user. - Section S2: the segmentation parameters seem to be an important part of tiling, but it is still unclear how they work. Is this a way of capturing tiles that have background in a certain orientation? How do parameters for border and corners interact with the background percentage and how does this influence segmentation? ****** Have all data underlying the figures and results presented in the manuscript been provided?** Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information. Reviewer #1: Yes Reviewer #2: Yes ******** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: Yes:** Jerome Cheng Reviewer #2: No Figure Files: While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Data Requirements: Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5. Reproducibility: To enhance the reproducibility of your results, PLOS recommends that you deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions, please see http://journals.plos.org/compbiol/s/submission-guidelines#loc-materials-and-methods https://doi.org/10.1371/journal.pcbi.1008349.r001
Revision 1
28 Aug 2020 Author Response Attachments Attachment Submitted filename: Reviewers_comments_R1.pdf https://doi.org/10.1371/journal.pcbi.1008349.r002
17 Sep 2020 Decision Letter - Dina Schneidman-Duhovny, Editor Dear Muñoz-Aguirre, We are pleased to inform you that your manuscript 'PyHIST: A Histological Image Segmentation Tool' has been provisionally accepted for publication in PLOS Computational Biology. Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests. Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated. IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript. Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS. Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. Best regards, Dina Schneidman Software Editor PLOS Computational Biology ********************************************************* Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: In the revised and significantly improved version of the manuscript, the authors addressed each reviewer's concerns, and all of my previous comments have been satisfactorily addressed. I do not have any new recommendations. Reviewer #2: The revised manuscript submitted by Muñoz-Aguirre and colleagues extensively address the comments raised by the reviewers. We commend them for adding detailed methods regarding pre-processing including tile extraction, additional relevant references, and mask comparisons in Supplementary Text S1 and Supplementary Figure S3. Further the edits done for figure 2 have enabled to message to be clearer and the authors have done a remarkable job. ****** Have all data underlying the figures and results presented in the manuscript been provided?** Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information. Reviewer #1: Yes Reviewer #2: Yes ******** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: Yes: Jerome Cheng Reviewer #2: Yes:** Sana Syed https://doi.org/10.1371/journal.pcbi.1008349.r003
Formally Accepted
9 Oct 2020 Acceptance Letter - Dina Schneidman-Duhovny, Editor PCOMPBIOL-D-20-00862R1 PyHIST: A Histological Image Segmentation Tool Dear Dr Muñoz-Aguirre, I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course. The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers. Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work! With kind regards, Matt Lyles PLOS Computational Biology \| Carlyle House, Carlyle Road, Cambridge CB4 3DN \| United Kingdom ploscompbiol@plos.org \| Phone +44 (0) 1223-442824 \| ploscompbiol.org \| @PLOSCompBiol https://doi.org/10.1371/journal.pcbi.1008349.r004

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .