DeepHiC: A generative adversarial network for enhancing Hi-C data resolution

Hao Hong; Shuai Jiang; Hao Li; Guifang Du; Yu Sun; Huan Tao; Cheng Quan; Chenghui Zhao; Ruijiang Li; Wanying Li; Xiaoyao Yin; Yangchen Huang; Cheng Li; Hebing Chen; Xiaochen Bo

doi:10.1371/journal.pcbi.1007287

Peer Review History

Original SubmissionJuly 24, 2019
29 Sep 2019 Decision Letter - Weixiong Zhang, Editor, Ferhat Ay, Editor Dear Dr Bo, Thank you very much for submitting your manuscript 'DeepHiC: A Generative Adversarial Network for Enhancing Hi-C Data Resolution' for review by PLOS Computational Biology. Your manuscript has been fully evaluated by the PLOS Computational Biology editorial team and in this case also by independent peer reviewers. The reviewers appreciated the attention to an important problem, but raised some substantial concerns about the manuscript as it currently stands. While your manuscript cannot be accepted in its present form, we are willing to consider a revised version in which the issues raised by the reviewers have been adequately addressed. We cannot, of course, promise publication at that time. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. Your revisions should address the specific points made by each reviewer. Please return the revised version within the next 60 days. If you anticipate any delay in its return, we ask that you let us know the expected resubmission date by email at ploscompbiol@plos.org. Revised manuscripts received beyond 60 days may require evaluation and peer review similar to that applied to newly submitted manuscripts. In addition, when you are ready to resubmit, please be prepared to provide the following: (1) A detailed list of your responses to the review comments and the changes you have made in the manuscript. We require a file of this nature before your manuscript is passed back to the editors. (2) A copy of your manuscript with the changes highlighted (encouraged). We encourage authors, if possible to show clearly where changes have been made to their manuscript e.g. by highlighting text. (3) A striking still image to accompany your article (optional). If the image is judged to be suitable by the editors, it may be featured on our website and might be chosen as the issue image for that month. These square, high-quality images should be accompanied by a short caption. Please note as well that there should be no copyright restrictions on the use of the image, so that it can be published under the Open-Access license and be subject only to appropriate attribution. Before you resubmit your manuscript, please consult our Submission Checklist to ensure your manuscript is formatted correctly for PLOS Computational Biology: http://www.ploscompbiol.org/static/checklist.action. Some key points to remember are: - Figures uploaded separately as TIFF or EPS files (if you wish, your figures may remain in your main manuscript file in addition). - Supporting Information uploaded as separate files, titled Dataset, Figure, Table, Text, Protocol, Audio, or Video. - Funding information in the 'Financial Disclosure' box in the online system. While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see here. We are sorry that we cannot be more positive about your manuscript at this stage, but if you have any concerns or questions, please do not hesitate to contact us. Sincerely, Ferhat Ay, Ph.D Associate Editor PLOS Computational Biology Weixiong Zhang Deputy Editor PLOS Computational Biology A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately: [LINK] Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: The authors present a method, named DeepHiC, that aims to increase the effective sequencing depth of Hi-C experiments. The method relies on a pair of neural networks that are trained in an adversarial manner similar to image super-resolution. While the paper is very clearly written and easy to follow, there are several critical issues that should be addressed. Further, while the authors demonstrate in several contexts that DeepHiCs outperforms previous work, the manuscript may benefit from a further discussion of the ramifications of these improvements. Major - While the general problem of increasing the effective sequencing depth of Hi-C experiments is important, the specific formulation considered by the authors is problematic. Specifically, it will never be the case that one has high resolution data for a few chromosomes and needs to improve the quality of the remainder (the cross-chromosomal setting). Rather, a user would like a model that had been trained on high quality data from one cell type and could then be used to improve the quality of other cell types (the cross-cell type setting). This distinction is important because, in the cross-cell type setting, a strong baseline is simply using the high resolution contact map from the cell type used to train the model. The Hi-C+ paper addresses this setting explicitly. - The authors should clarify what precisely constituted the ground truth labels. Specifically, was the Hi-C data that had outliers removed and then min-max scaled used solely for training, or was it also considered to be the ground truth? Considering these transformed values to be the ground truth would be unfair to the competing models that had been pre-trained on data sets which were processed in a different manner. Supplementary Note 1 suggests that the output from the models are not converted back to the original space because their ranges differ, with DeepHiC ranging from 0-1, and other methods ranging from 0-100. Further, the output from the webserver is between 0 and 1. It is okay for DeepHiC to be trained using data processed in a different manner than other approaches, but all approaches should be evaluated on the original Hi-C counts to ensure fairness. Additionally, because most outliers tend to occur on at low genomic distances, squashing these outliers only for DeepHiC may help explain why it outperforms the other models by so much at low genomic distances. - It is unclear that performing better than an experimental replicate by such a large margin across all genomic distances is a reasonable result. The authors should comment on what such a result means and why there is such large variance between replicates. Generally one would think of an experimental replicate as almost an upper bound of performance. This result suggests that it is significantly better to use DeepHiC on a HiC map with low sequencing depth than to perform the actual high sequencing depth experiment. This may be the actual result, but it is a significant claim and should have more supporting evidence. - When a researcher has a Hi-C map with moderate coverage, they will likely want to know by what factor they should improve the effective coverage of that map in order to discover various features of chromatin architecture. For example, should they use a model that improves coverage by 100x (by training using 1% of the reads) or only by 16x (by training using ~6% of the reads)? Is it always better to use a model that improves coverage the most, or does improvement plateau after a certain amount of boost? It would be useful to readers if the authors could provide guidance on which model is appropriate in which setting, preferably by showing the performance on downstream tasks of models trained to boost coverage by different amounts. Minor - The webserver did not work when tried - An important distinction that should be investigated is whether the contact maps yielded by DeepHiC are better because they are effectively higher sequencing depth or because the generative network simply cleans the data. This question could be answered by training a model in the same manner as DeepHiC, but having the target be the same as the input, similar to an autoencoder. This model would demonstrate the performance one gets from cleaning the data. Thus, any improvement that DeepHiC has over it could reasonably be attributed to the importance of a higher sequencing depth. - The authors clearly describe the architecture, hyperparameters, and composition of loss functions of the models that compose DeepHiC. However, this naturally leads to the question of how such were determined. While it is not necessary for the authors to perform grid-search using a validation set, they should at least comment on how such were determined and clarify that they were not determined based on performance on the test set. - The SSIM score is difficult to interpret as the primary evaluation metric. While the authors soundly reason that using MSE as the objective function during training may yield overly smooth predictions, there is no reason not to use it to evaluate trained models, particularly because it is much more interpretable to most readers. - It is unclear how predictions were made in a genome-wide fashion. Was a cross-validation approach used, as alluded to in the methods? This should be explicitly stated in the text and include the number of folds. Similarly, in the final results section, the data set used to train the DeepHiC model should be briefly described. - The term "(q-value < 0.5-percentile)" is difficult to parse. Was it a q-value < 0.5 that was used, or were the bottom 0.5% of q-values used? Reviewer #2: Review is uploaded as an attachment. Reviewer #3: Hong, Jiang et al in the manuscript "DeepHiC: A Generative Adversarial Network for Enhancing Hi-C Data Resolution" implement a generative adversarial network that interpolates low resolution HiC maps to produce higher-resolution maps. Their implementation offers some quantitative improvements over previously published methods. Unfortunately the manuscript does not offer sufficient biological or methodological insight to justify publication in this journal. Their success in interpolating the data confirm the low amount of information entropy present in large HiC data sets, as it was previously shown. As a consequence, the manuscript does not seem of high importance to researchers in the field. I suggest submitting the manuscript to some more technical journal. ******** Have all data underlying the figures and results presented in the manuscript been provided?** Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes ******** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review?** For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: Yes: Oana Ursu Reviewer #3: No Attachments Attachment Submitted filename: 2019DeepHiCReview.pdf https://doi.org/10.1371/journal.pcbi.1007287.r001
Revision 1
15 Nov 2019 Author Response Attachments Attachment Submitted filename: Response_to_Reviewers.docx https://doi.org/10.1371/journal.pcbi.1007287.r002
14 Jan 2020 Decision Letter - William Stafford Noble, Editor, Ferhat Ay, Editor Dear Dr Bo, We are pleased to inform you that your manuscript 'DeepHiC: A Generative Adversarial Network for Enhancing Hi-C Data Resolution' has been provisionally accepted for publication in PLOS Computational Biology. Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. Please be aware that it may take several days for you to receive this email; during this time no action is required by you. Once you have received these formatting requests, please note that your manuscript will not be scheduled for publication until you have made the required changes. Also you still need to address the minor suggestions from one of the reviewers and do a thorough grammatical check. In the meantime, please log into Editorial Manager at https://www.editorialmanager.com/pcompbiol/, click the "Update My Information" link at the top of the page, and update your user information to ensure an efficient production and billing process. One of the goals of PLOS is to make science accessible to educators and the public. PLOS staff issue occasional press releases and make early versions of PLOS Computational Biology articles available to science writers and journalists. PLOS staff also collaborate with Communication and Public Information Offices and would be happy to work with the relevant people at your institution or funding agency. If your institution or funding agency is interested in promoting your findings, please ask them to coordinate their releases with PLOS (contact ploscompbiol@plos.org). Thank you again for supporting Open Access publishing. We look forward to publishing your paper in PLOS Computational Biology. Sincerely, Ferhat Ay, Ph.D Associate Editor PLOS Computational Biology Weixiong Zhang Deputy Editor PLOS Computational Biology Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: I am happy to report that the authors appear to have put a great deal of effort into their revision and have addressed all of my concerns. I would also like to apologize to the authors for my comment that their webserver did not work. The webserver did indeed work but took some time to run and I had forgotten to remove the comment after the webserver finished running. Reviewer #2: Review is uploaded as an attachment. ******** Have all data underlying the figures and results presented in the manuscript been provided?** Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information. Reviewer #1: Yes Reviewer #2: Yes ******** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review?** For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: Yes: Jacob Schreiber Reviewer #2: Yes: Oana Ursu Attachments Attachment Submitted filename: 2019-12-17_re-review.pdf https://doi.org/10.1371/journal.pcbi.1007287.r003
Formally Accepted
13 Feb 2020 Acceptance Letter - William Stafford Noble, Editor, Ferhat Ay, Editor PCOMPBIOL-D-19-01246R1 DeepHiC: A Generative Adversarial Network for Enhancing Hi-C Data Resolution Dear Dr Bo, I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course. The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers. Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work! With kind regards, Laura Mallard PLOS Computational Biology \| Carlyle House, Carlyle Road, Cambridge CB4 3DN \| United Kingdom ploscompbiol@plos.org \| Phone +44 (0) 1223-442824 \| ploscompbiol.org \| @PLOSCompBiol https://doi.org/10.1371/journal.pcbi.1007287.r004

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .