Peer Review History
Original SubmissionAugust 26, 2021 |
---|
PONE-D-21-27639SensiMix: Sensitivity-Aware 8-bit Index & 1-bit Value Mixed Precision Quantization for BERT CompressionPLOS ONE Dear Dr. Kang, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. In overall, this is a rich and interesting article. It is well written and described. The authors should further improve it by following all the minor comments provided by the three reviewers to meet PLOS ONE publication criteria. Please take carefully into account the comments of all the referees for improving the manuscript to meet the required standards by PLOS ONE before resubmitting it to the journal. Please submit your revised manuscript by Nov 19 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript:
If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Sergio Consoli Academic Editor PLOS ONE Journal Requirements: 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and 2. Thank you for stating the following financial disclosure: [This work was supported by Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No.2020-0-00894, Flexible and Efficient Model Compression Method for Various Applications and Environments, and No.2017-0-01772, Development of QA systems for Video Story Un derstanding to pass the Video Turing Test). The Institute of Engineering Research and ICT at Seoul National University provided research facilities for this work.]
Please state what role the funders took in the study. If the funders had no role, please state: "The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript." If this statement is not correct you must amend it as needed. Please include this amended Role of Funder statement in your cover letter; we will change the online submission form on your behalf. 3. We note you have included a table to which you do not refer in the text of your manuscript. Please ensure that you refer to Table 2 in your text; if accepted, production will need this reference to link the reader to the Table. 4. Please include a copy of Table ?? which you refer to in your text on pages 13 and 14. 5. Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: No ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: No ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: The research presented here is very interesting, and I'm excited to see such nice size and speed gains through smart use of quantization. I feel that the manuscript is mostly clear and concise, however there area few areas I would like to see clarified or elaborated on. I'm confused by the terms "8-bit index" and "1-bit value". If I am understanding correctly, both 8-bit and 1-bit are quantization strategies on the weights matrices. I'm not sure what the index in "8-bit index" refers to, or why it's different from the value in "1-bit value", other than in bit-width. If the difference between the two strategies is only the amount of quantization (and which layers you apply them to), I would suggest you either name them similarly to reflect this (8-bit value and 1-bit value quantization), or just refer to them as 8-bit quantization and 1-bit quantization. In table 1, you state that matrix Wldq is an "8-bit index-de-quantized weight matrix of the layer l". However, my understanding is that your de-quantization process takes a matrix of INT8s and (with the min and max) regenerates an approximation of the original matrix of 32FPs. Shouldn't this matrix contain 32-bit values? If not, why not? This also leads to confusion in 8-bit de-quantization, where the de-quantized values are weight e.g. -1.28, 0.005, 1.00, and 1.27, but Wldq is said to be an 8-bit matrix. How can you store a value of -1.28 inside a INT8? Line 312 then describes it as an 8-bit clip function, but if the input to alg. 4 was indeed an 8-bit signed integer, the clip function should have no effect. The only way I can make sense of this is if x is actually a FP32, but that doesn't agree with what you've previously written about Wldq. Line 268: "8-bit index quantization consists of two steps: quantization and de-quantization." This sentence is very confusing. I understand that both quantization and de-quantization are elaborated on later, but I find it confusing to have step 1 of quantization be its self, and step 2 be undoing its self. I think what you're getting at is something like, "In order to utilize 8-bit quantization, SensiMix is able to quantize 32FP values to 8-bit int values, and de-quantize the values back to 32-bit.", but elaboration would make this clearer. I have some issues with the differentiability of your clipping functions. In the 8-bit clip, if the input are FP32 values, values outside of the range (-128,127) will be clipped to -128 and 127. However, the clip function is undifferentiable at this values. Why is this not a problem? You mention in the 1-bit clip section that most values are near zero, and are unlikely to be near the clipping limits, but I don't have a similar kind of intuition about the 8-bit clipping. I would appreciate if you would elaborate about why both clip functions being undifferentiable at their min and max is not a problem (or if it is, explain). Figure 3. I'm really not sure what this figure is trying to convey. Shouldn't the weights after ABWR be clustered at -1 and 1? To my understanding of ABWR, the left and right subfigures seem swapped; the values before ABWR should be clustered around 0, and afterwards they should be clustered around -1 and 1. Could you explain why Fig 3 shows what it does? 392: Figure 4 seems to show a layer FP32 FFN being converted into a 1-bit FFN after 1 epoch, but that doesn't really agree with your description of adding MP layers to the bottom (I assume the bottom of the existing MP encoder layers, but you don't specify this). Which is correct for ILF? ------------------------------------- small details Line 43: you might want to describe 1-bit quantization here, or cite a description. I assume it quantizes values to -1 or 1, but a reader could easily assume it quantizes it to 0 or 1. "For the 1-bit quantization, we apply standard 1-bit quantization." This sentence adds nothing, either explain what 1-bit quantization is, or remove this sentence. Line 146: suggest changing "low-bit numbers" to something like "reduced precision numbers", or "smaller bit-width integers", since low-bit is ambiguous, and to my ears "low-bit" refers to the position of a bit, not the width of an integer. 452: a table link is broken. 520: broken table link. Reviewer #2: The authors present a method for adaptive weight quantization for the BERT model. The results support the novelty of their method and the paper is well-written and easy to follow. Here are a few comments to help the authors revise the manuscript: 1) Outside the BERT domain there seems to be a body of literature on Mixed Precision Quantization based on the sensitivity of the outputs. e.g. these came up with a simple search: https://ieeexplore.ieee.org/document/9207413, https://arxiv.org/abs/2103.10051 2) There also seems to be more literature on quantization of BERT models e.g. https://arxiv.org/abs/2101.01321, https://arxiv.org/abs/2101.05938, https://web.stanford.edu/class/archive/cs/cs224n/cs224n.1194/reports/custom/15742249.pdf, https://arxiv.org/pdf/2010.07109.pdf which seems to be ignored by the authors. Could the authors elaborate more on this to stress their novelty? 3) Line 452 and 520, 538 reference to Table is missing. 4) In the questions outlined in Experiments, the section numbering is missing. Reviewer #3: (1) The term BERT should be expounded first before first time usage. (2) The introduction was well prepared (well done!) (3) The section on related works should be expanded. (4) The section " Related Work " should be " Related Works ". (5) The proposed method SENSIMIX should be expanded to include algorithmic description in form of flowhart or pseudocode. This is necessary to see the actual operation of the SENSIMIX (6) The authors did nice work with textual description, but the work should be expanded to include diagramatic illustrations (7) Is there anyway that Genetic Algorithm (GA) can be used for model reduction in this work ? (8) Both 8-bit and MP encoders should be described with a flowchart or pseudocode. (9) How is FB32 matrix in line 276 represented ? (In terms of matrix) (10) Why the use of the method in Equation 6? Is it better than ADAM method, or SGM ? (11) In overall, this is a very rich article and well written and described. The authors should include more diagrams and include another section on the description of the computational platform (software) used. ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No Reviewer #3: Yes: Oluleye Hezekiah Babatunde [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. |
Revision 1 |
PONE-D-21-27639R1SensiMix: Sensitivity-Aware 8-bit Index & 1-bit Value Mixed Precision Quantization for BERT CompressionPLOS ONE Dear Dr. Kang, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. The paper has improved evidently and the contents are worth of interest for the community. There are however still points to be addressed before the manuscript can reach an acceptable standard level for being published. In particular make sure to address the comment by R2 who raised a concern about the background section and experimental evaluation relative to other methods in the literature, making thus in discussion the novelty of the proposed method. If an additional experimental evaluation is not possible, it should be at least reported a full literature review in the background section related to quantization methods for neural nets. Such a literature review should explain the main differences with each method, explaining in case why it was not reported in the comparison by the authors. It should be stressed then also why the proposed SensiMix method is novel relative to these methods. Please take carefully into account the comments of all the referees for improving the manuscript to meet PLOS ONE standards before resubmitting it to the journal. Please submit your revised manuscript by Feb 17 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript:
If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Sergio Consoli Academic Editor PLOS ONE Journal Requirements: Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #2: (No Response) Reviewer #4: All comments have been addressed ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #2: Partly Reviewer #4: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #2: Yes Reviewer #4: Yes ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #2: Yes Reviewer #4: Yes ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #2: Yes Reviewer #4: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #2: Unfortunately the authors have not addressed my concerns with regards to comparison with other quantization methods, making it very difficult to judge the actual novelty of the paper. If an experimental evaluation is not possible, I think there should be at least a full literature review in the background section in quantization methods for neural nets. Such a literature review should explain the differences with each one, why it wasn't used for comparison, and why it SensiMix is novel. Reviewer #4: The authors propose an acceleration of the BERT family of models using quantization techniques. Although the contribution is not difficult to pursue, the results are promising in practice. Therefore, I believe that the paper has merit to be accepted in a multidisciplinary journal such as Plos One. I suggest introducing and discussing recent references on how to increase the computational efficiency of BERT from different perspectives. Some examples are BERT Pre-training Acceleration Algorithm Based on MASK Mechanism Hardware Acceleration of Fully Quantized BERT for Efficient Natural Language Processing Plug-Tagger: A Pluggable Sequence Labeling Framework Using Language Models A Comprehensive Survey on Training Acceleration for Large Machine Learning Models in IoTs ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #2: No Reviewer #4: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. |
Revision 2 |
SensiMix: Sensitivity-Aware 8-bit Index & 1-bit Value Mixed Precision Quantization for BERT Compression PONE-D-21-27639R2 Dear Dr. Kang, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Sergio Consoli Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #2: All comments have been addressed Reviewer #4: All comments have been addressed ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #2: Partly Reviewer #4: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #2: Yes Reviewer #4: Yes ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #2: Yes Reviewer #4: Yes ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #2: Yes Reviewer #4: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #2: The authors have added a detailed background section referring to similar works and addressing my earlier concerns. Reviewer #4: The authors added my suggestions and improve the manuscript in this review round. Congratulations on your work! ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #2: No Reviewer #4: No |
Formally Accepted |
PONE-D-21-27639R2 SensiMix: Sensitivity-Aware 8-bit Index & 1-bit Value Mixed Precision Quantization for BERT Compression Dear Dr. Kang: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Sergio Consoli Academic Editor PLOS ONE |
Open letter on the publication of peer review reports
PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.
We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.
Learn more at ASAPbio .