Are open set classification methods effective on large-scale datasets?

Ryne Roady; Tyler L. Hayes; Ronald Kemker; Ayesha Gonzales; Christopher Kanan

doi:10.1371/journal.pone.0238302

Peer Review History

Original SubmissionFebruary 3, 2020
14 Jul 2020 Decision Letter - Hao Sun, Editor PONE-D-20-03185 Are open set classification methods effective on large-scale datasets? PLOS ONE Dear Dr. Roady, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Aug 28 2020 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols We look forward to receiving your revised manuscript. Kind regards, Hao Sun Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. Thank you for stating the following in the Financial Disclosure section: 'TH and CK were supported during this work in part by DARPA/MTO (https://www.darpa.mil/about-us/offices/mto) Lifelong Learning Machines program [W911NF-18-2-0263] and AFOSR (https://www.wpafb.af.mil/afrl/afosr/) grant [FA9550-18-1-0121]. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.' We note that one or more of the authors are employed by a commercial company: Paige, New York a. Please provide an amended Funding Statement declaring this commercial affiliation, as well as a statement regarding the Role of Funders in your study. If the funding organization did not play a role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript and only provided financial support in the form of authors' salaries and/or research materials, please review your statements relating to the author contributions, and ensure you have specifically and accurately indicated the role(s) that these authors had in your study. You can update author roles in the Author Contributions section of the online submission form. Please also include the following statement within your amended Funding Statement. “The funder provided support in the form of salaries for authors [insert relevant initials], but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are articulated in the ‘author contributions’ section.” If your commercial affiliation did play a role in your study, please state and explain this role within your updated Funding Statement. b. Please also provide an updated Competing Interests Statement declaring this commercial affiliation along with any other relevant declarations relating to employment, consultancy, patents, products in development, or marketed products, etc. Within your Competing Interests Statement, please confirm that this commercial affiliation does not alter your adherence to all PLOS ONE policies on sharing data and materials by including the following statement: "This does not alter our adherence to PLOS ONE policies on sharing data and materials.” (as detailed online in our guide for authors http://journals.plos.org/plosone/s/competing-interests) . If this adherence statement is not accurate and there are restrictions on sharing of data and/or materials, please state these. Please note that we cannot proceed with consideration of your article until this information has been declared. c. Please include both an updated Funding Statement and Competing Interests Statement in your cover letter. We will change the online submission form on your behalf. Please know it is PLOS ONE policy for corresponding authors to declare, on behalf of all authors, all potential competing interests for the purposes of transparency. PLOS defines a competing interest as anything that interferes with, or could reasonably be perceived as interfering with, the full and objective presentation, peer review, editorial decision-making, or publication of research or non-research articles submitted to one of the journals. Competing interests can be financial or non-financial, professional, or personal. Competing interests can arise in relationship to an organization or another person. Please follow this link to our website for more details on competing interests: http://journals.plos.org/plosone/s/competing-interests 3. We note you have included a table to which you do not refer in the text of your manuscript. Please ensure that you refer to Table 1 in your text; if accepted, production will need this reference to link the reader to the Table. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Yes ******** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes ****** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: No Reviewer #2: Yes ****** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ****** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: This paper studies the performance of major inference methods and regularization strategies for the purpose of open set classification(OSC). The manuscript is well written, comprehensively summarizing methods in different categories. It conducts multiple numerical tests using large-scale datasets and compares accuracy and computational costs of different methods. In the end, it concludes with some useful recommendations for researchers in this field. I recommend accepting this work after the authors address some minor issues: 1. While this paper targets large-scale dataset, the studied ResNet only has 18 layers contrast to popular networks that have hundreds of layers and convolutional filters(Lines 433-434). Therefore, the authors further studies the effects of model depth and width in Section 4.4. However, in Figure 7 the authors only consider τ-Softmax and ODIN. Could the authors justify why other inference/regularization methods are not considered? 2. In Lines 461-463, the authors state that "in general OSC performance decreases as the similarity between the OOD and in-distribution data increases". Could the authors explain this observation? 3. In Lines 428-431, the authors says the resulting ROC curves "demonstrate that there is little to no benefit from background class regularization versus standard cross-entropy training in the open set classification task". However, in Table 2 it seems that the AUOSC values from background class regularization in ImageNet-Open do have some improvement versus the AUOSC values in cross-entropy. Additionally, in previous section the authors mention that AUOSC metric is a better indicator when comparing regularization techniques. Therefore, I feel that the authors should make a more comprehensive conclusion by considering Table 2 and Figure 6 together. 4. In Figure 7, it seems that the legends don’t tell which curves are from Intra-Dataset. Reviewer #2: This paper presents a method to optimize the training/validation data distribution in supervised classification. The authors of the paper consider two approaches: 1 to separate knowns from unknown data and 2 feature space regularization strategies to improve model robustness to novel inputs. Different from traditional methods that focus on those approaches separately, the authors of the paper uniquely combine those two methods by exploring the relationship between the two approaches and directly comparing performance on ImageNet dataset. The authors consider regularization and specialized inference methods together for data augmentation, and find good result on large-scale dataset such as imagenet. Advantages: 1. the authors of the paper present detailed elaboration on the problem, which is the performance of the open set classification in large-scale dataset. 2. the authors of the paper compare methods across open set classification paradigms on large-scale, high-resolution instead of low-resolution MNIST and CIFAR. 3. the authors of the paper compare inference methods and feature space regularization strategies and combine them for further evaluation for out-of-distribution problem. 4. the authors of the paper provide certain regularization method with better performance for out-of-distribution problem. 5. the experimental data are well organized. 6. the comparison baselines are enough and up-to-date. Weakness: 1. I don't know if it is the problem with the latex template but why all figures are at the bottom of the paper? Please re-organize it if it is not the template matter. 2. I think the authors should consider to add more details on ablation study to this work, considering independent input perturbation and temperature scaling factors are all well studied in different works. 3. Input perturbation is essentially a data augmentation method that has been widely used in improving DNN performances. The authors of the paper should considering more data augmentation methods in their approach and evaluate which one is better. In the same time, this will also benefit the ablation study in Figure 4. ****** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review?** For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. https://doi.org/10.1371/journal.pone.0238302.r001
Revision 1
12 Aug 2020 Author Response Responses to Reviewers and Revisions to the Paper Reviewer #1: While this paper targets large-scale dataset, the studied ResNet only has 18 layers contrast to popular networks that have hundreds of layers and convolutional filters(Lines 433-434). Therefore, the authors further studies the effects of model depth and width in Section 4.4. However, in Figure 7 the authors only consider τ-Softmax and ODIN. Could the authors justify why other inference/regularization methods are not considered? ODIN was chosen because it was one of the best performing methods, and τ-Softmax was provided as a comparison. We agree that other inference methods can easily be added and serve as additional evidence to our initial analysis that OSC performance generally follows the same trend as closed-set accuracy when the model capacity is varied within a ResNet architecture. We have updated Figure 7 to include the other inference methods in considered in our paper. Additionally we have concluded that including both the Intra-dataset and Inter-dataset data for this ablation is unnecessary as the trends from the inter-dataset (ImageNet-Open) OOD data is the most significant to identifying trends. Reviewer #1: In Lines 461-463, the authors state that "in general OSC performance decreases as the similarity between the OOD and in-distribution data increases". Could the authors explain this observation? This was an empirical observation drawn from Figure 5. The underlying reasons for this performance decrease is due to OOD samples from novel classes being confused for known classes. This is likely due to the network learning features to distinguish between classes during normal discriminative training; however, if an OOD image shares some of these distinguishing features with a known class then there is a higher likelihood that the OOD image will be incorrectly identified. We added this hypothesis in the discussion section following the discussion of feature space regularization strategies: Fundamentally, the increase in OSC difficulty as the similarity increases between OOD and in-distribution samples is due to the network confusing OOD inputs with known classes. This confusion stems from the feature space of the CNN classifier which learns to be most sensitive to variations in the training distribution that are semantically meaningful while ignoring variations that are not semantically meaningful among the known classes. Dealing with semantically meaningful variations in images from both known and unknown classes that are not included in the training set is ultimately the most significant problem in the OSC process. Reviewer #1: In Lines 428-431, the authors says the resulting ROC curves "demonstrate that there is little to no benefit from background class regularization versus standard cross-entropy training in the open set classification task". However, in Table 2 it seems that the AUOSC values from background class regularization in ImageNet-Open do have some improvement versus the AUOSC values in cross-entropy. Additionally, in previous section the authors mention that AUOSC metric is a better indicator when comparing regularization techniques. Therefore, I feel that the authors should make a more comprehensive conclusion by considering Table 2 and Figure 6 together. We have tempered this statement in describing the lack of benefit from background regularization in large-scale OSC problems. We have added the additional observation that while the ROC charts qualitatively appear to show little benefit from the background class regularization, we have nevertheless shown statistically significant increases in AUROC performance from this feature space regularization approach. The specific wording of the paragraph has been changed to: In Fig. 6 we also show the resulting ROC curves for the ImageNet Intra-Dataset problem across the three feature spaces tested. While qualitatively there appears to be little benefit from background class regularization versus standard cross-entropy training we did find significant differences in the AUROC metric calculated across the full range of OOD detection thresholds as reported in Table 2. Reviewer #1: In Figure 7, it seems that the legends don’t tell which curves are from Intra-Dataset. We have updated Figure 7. It now includes a wider variety of inference methods as explained in the first response above. Reviewer #2: I don't know if it is the problem with the latex template but why all figures are at the bottom of the paper? Please reorganize it if it is not the template matter. The submission template for PLOS ONE required figures to be submitted separately from text. Reviewer #2: I think the authors should consider to add more details on ablation study to this work, considering independent input perturbation and temperature scaling factors are all well studied in different works. Input perturbation is essentially a data augmentation method that has been widely used in improving DNN performances. The authors of the paper should considering more data augmentation methods in their approach and evaluate which one is better. We have included in our ablation of the ODIN method an independent analysis of input perturbation and temperature scaling. While we believe different data augmentation methods will have a significant effect on OOD detection performance, we focused on using the standard methods used by the creators of the approaches we are comparing to assess their capabilities with different inference methods on large-scale datasets. Attachments Attachment Submitted filename: Response_to_Reviewers.pdf https://doi.org/10.1371/journal.pone.0238302.r002
14 Aug 2020 Decision Letter - Hao Sun, Editor Are open set classification methods effective on large-scale datasets? PONE-D-20-03185R1 Dear Dr. Roady, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Hao Sun Academic Editor PLOS ONE https://doi.org/10.1371/journal.pone.0238302.r003
Formally Accepted
26 Aug 2020 Acceptance Letter - Hao Sun, Editor PONE-D-20-03185R1 Are open set classification methods effective on large-scaledatasets? Dear Dr. Roady: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Professor Hao Sun Academic Editor PLOS ONE https://doi.org/10.1371/journal.pone.0238302.r004

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .