Revealing undergraduate biology students’ conception of variability and error bars within graphing

Lauren Stoczynski; David Zis; Anna Woodruff; Susan Maruca; Eli Meir; Joel K. Abraham; Stephanie M. Gardner

doi:10.1371/journal.pone.0343301

Peer Review History

Original SubmissionSeptember 9, 2025
4 Dec 2025 Decision Letter - David R Wessner, Editor Dear Dr. Stoczynski, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. ============================== The study is impressively large and your analysis is quite thorough, as both reviewers noted. Your descriptions of the value of this study also should be commended. Despite quantitative literacy being a pillar of Vision and Change, we too often do not systematically assess our students knowledge of graphing, error bars, statistical significance, etc. This work help fill this gap.One reviewer provided many methodological and data presentation suggestions. I would encourage you to review these points. In my opinion though, these should just be considered as suggestions. The other reviewer raised one major concern regarding the figures. Please check Fig. 8 and Fig. 9. Both are mentioned in the text, but it appears that one of them is not attached to the submission. ============================== Please submit your revised manuscript by Jan 18 2026 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org . When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols . Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols . We look forward to receiving your revised manuscript. Kind regards, David R Wessner, Ph.D. Academic Editor PLOS ONE Journal requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. Thank you for stating the following financial disclosure: “This research received grants from the National Science Foundation under grant Nos 1726180 and 2111150. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.” Please state what role the funders took in the study. If the funders had no role, please state: "The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript." If this statement is not correct you must amend it as needed. Please include this amended Role of Funder statement in your cover letter; we will change the online submission form on your behalf. 3. When completing the data availability statement of the submission form, you indicated that you will make your data available on acceptance. We strongly recommend all authors decide on a data sharing plan before acceptance, as the process can be lengthy and hold up publication timelines. Please note that, though access restrictions are acceptable now, your entire data will need to be made freely accessible if your manuscript is accepted for publication. This policy applies to all data except where public deposition would breach compliance with the protocol approved by your research ethics board. If you are unable to adhere to our open data policy, please kindly revise your statement to explain your reasoning and we will seek the editor's input on an exemption. Please be assured that, once you have provided your new statement, the assessment of your exemption will not hold up the peer review process. 4. We note that there is identifying data in the Supporting Information file <FirstClassVariabilityData.xlsx and variability by graphs.xlsx>. Due to the inclusion of these potentially identifying data, we have removed this file from your file inventory. Prior to sharing human research participant data, authors should consult with an ethics committee to ensure data are shared in accordance with participant consent and all applicable local laws. Data sharing should never compromise participant privacy. It is therefore not appropriate to publicly share personally identifiable data on human research participants. The following are examples of data that should not be shared: -Name, initials, physical address -Ages more specific than whole numbers -Internet protocol (IP) address -Specific dates (birth dates, death dates, examination dates, etc.) -Contact information such as phone number or email address -Location data -ID numbers that seem specific (long numbers, include initials, titled “Hospital ID”) rather than random (small numbers in numerical order) Data that are not directly identifying may also be inappropriate to share, as in combination they can become identifying. For example, data collected from a small group of participants, vulnerable populations, or private groups should not be shared if they involve indirect identifiers (such as sex, ethnicity, location, etc.) that may risk the identification of study participants. Additional guidance on preparing raw data for publication can be found in our Data Policy (https://journals.plos.org/plosone/s/data-availability#loc-human-research-participant-data-and-other-sensitive-data) and in the following article: http://www.bmj.com/content/340/bmj.c181.long. Please remove or anonymize all personal information (ID numbers), ensure that the data shared are in accordance with participant consent, and re-upload a fully anonymized data set. Please note that spreadsheet columns with personal information must be removed and not hidden as all hidden columns will appear in the published file. If the reviewer comments include a recommendation to cite specific previously published works, please review and evaluate these publications to determine whether they are relevant and should be cited. There is no requirement to cite these works unless the editor has indicated otherwise. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? Reviewer #1: Partly Reviewer #2: Yes ******** 2. Has the statistical analysis been performed appropriately and rigorously? -->?> Reviewer #1: No Reviewer #2: Yes ****** 3. Have the authors made all data underlying the findings in their manuscript fully available??> The PLOS Data policy Reviewer #1: Yes Reviewer #2: Yes ****** 4. Is the manuscript presented in an intelligible fashion and written in standard English??> Reviewer #1: Yes Reviewer #2: Yes ****** Reviewer #1: 1. The study presents the results of original research This study appears to present the results of original research. The topic is timely, and the authors demonstrate technical proficiency in both quantitative and qualitative methods. 2. Results have not been published elsewhere It does not appear that the results have been reported elsewhere. The work seems to be novel and contributes original findings to the field. 3. Experiments, statistics, and analyses a. Statistical Analysis The study employs over twenty chi-square tests on a large sample and produces more than twenty plots to illustrate category-level percentages. I recommend that the authors consider a more efficient and integrative approach. Rather than conducting a series of separate chi-square tests with a common dependent variable (for example, “Made Raw Bar Graphs”), the authors could model all outcomes simultaneously using multinomial logistic regression. If the dependent variable were Type of Graph Created (e.g., “Raw Bar,” “Error Bar,” “Mean Graph”), the predictors such as interpreting raw bar graphs or interpreting error bar graphs could be entered into a single multinomial logistic regression model. This would allow the researchers to estimate the probability of each outcome while controlling for all predictors in a single model rather than running multiple independent tests. There are several limitations to running multiple chi-square tests. Large sample sizes make chi-square tests prone to significance even when the practical differences are negligible. Moreover, conducting multiple binary tests with one independent and one dependent variable increases the likelihood of a Type I error or false positive. In contrast, multinomial logistic regression offers several advantages. It reduces familywise error and thus decreases the likelihood of Type I error. It provides a more efficient analytic framework by integrating predictors into one model, and it produces interpretable coefficients and odds ratios that facilitate comparison across predictors. Finally, it supports concise tabular presentation of results, which is considerably more reader-friendly than numerous bar plots. Overall, this approach would yield more coherent and statistically robust conclusions that are well supported by the data. I encourage the authors to consider this method using the multinom() function from the nnet package in R, which performs multinomial logistic regression. This approach allows multiple categorical outcomes to be modeled simultaneously rather than conducting a series of separate chi-square tests. The multinom() function efficiently estimates the probability of each outcome category as a function of one or more predictors, offering odds ratios and confidence intervals that enhance interpretability and parsimony. For example, if the dependent variable were Graph_Type (e.g., "raw", "error_bar", "mean") and the independent variables were Interpreted_Error_Bar, Interpreted_Raw_Bar, and Interpreted_Mean_Bar, a simple model could be specified as: library(nnet) model <- multinom(Graph_Type ~ Interpreted_Error_Bar + Interpreted_Raw_Bar + Interpreted_Mean_Bar, data = your_data) summary(model) This example is generic. I encourage the authors to consider whether this would work and research it and to feel free to dispute this recommendation in their response if it does not work, explaining why. Be sure to include the regression tables in your manuscript. These would include the coefficients for each predictor relative to the reference category of the dependent variable, the standard errors, z-values, p-values, and odds ratios with 95% confidence intervals. It can also be helpful to include predicted probabilities for each outcome category to illustrate the practical interpretation of the model. Organizing the table so that each dependent variable category is clearly labeled and showing all relevant predictors side by side will make it easier for readers to understand the results without having to refer to multiple figures. b. Multiple Figures vs. One Table As an R user, I genuinely appreciate the researchers’ figures. They demonstrate a strong command of R’s advanced graphing capabilities and reflect thoughtful attention to data visualization. However, I recommend reconsidering the presentation format. The inclusion of more than twenty figures, each depicting a subset of categorical comparisons, makes the results difficult to follow and interpret as a coherent whole. A more effective alternative would be to replace the numerous figures with a single comprehensive table that displays the counts and percentages for all relevant variable combinations, with clear labeling to indicate which variables or categories each row represents. A well-designed table would allow readers to view all comparisons at once, identify patterns and relationships across variables more efficiently, reference specific values precisely, and cross-validate their interpretation against the statistical results, such as those produced by a multinomial logistic regression. If the authors wish to retain visualization for accessibility and readability, they might consider supplementing the table with one or two summary figures that highlight key findings or effect sizes, such as predicted probabilities derived from the regression model. This would balance visual appeal with analytic clarity. The table itself should be self-contained, including variable names, category labels, sample sizes, and proportions, and it should mirror the structure of the statistical model. Each dependent variable category could be presented as a block, with independent variables shown alongside the relevant counts, percentages, and model-derived statistics such as odds ratios, confidence intervals, and p-values. Adopting this approach would substantially enhance clarity, interpretability, and reproducibility while preserving the authors’ evident technical skill in R. c. Qualitative Portion The qualitative section of the study is interesting, particularly in its use of coded data as numeric variables for some of the chi-square tests. There are, however, several areas that could be strengthened. First, I recommend consulting and citing Saldaña’s Coding Manual for Qualitative Researchers (2025 edition). This resource distinguishes between “thematic coding” and “theming the data,” which may help the authors refine their methodological description. Second, greater clarity is needed regarding the term “inter-rater reliability.” The text suggests that this might refer instead to inter-coder agreement, where a given percentage of codes overlap between coders. Inter-rater reliability typically refers to consistency among raters using a psychometric instrument such as a rubric, whereas inter-coder agreement assesses similarity in qualitative coding. The authors should clarify their intended meaning and provide appropriate citations. Finally, more detail on the coding process would strengthen this section. The authors mention both inductive coding and the use of a codebook, which suggests a hybrid or sequential approach. Inductive coding generally implies open coding without predefined categories, while a codebook implies a deductive or a priori framework. Clarifying whether the codebook was generated after an initial inductive phase and subsequently applied deductively would help readers better understand the analytic process. d. Citations of Methods I also recommend citing and referencing all R packages and software used in the analysis, as well as any qualitative sources that informed the methodology. Within the R community, it is considered both best practice and professional courtesy to credit package developers, particularly in publications such as PLOS that value open science and reproducibility. Including these citations would also enhance the transparency and replicability of the study and will help the developers of the code. 4. Research Ethics I believe strong research ethics and publication standards were followed in this study. The research appears to meet all applicable standards for the ethics of experimentation and research integrity. 5. Reporting and Data Availability The article generally adheres to appropriate reporting guidelines and community standards of data availability. Sharing the R code used in the analysis would further strengthen transparency and reproducibility and would help readers understand how the models were derived from the multiple-tab Excel file. 6. Final Thoughts Overall, this is a promising and well-conceived study that demonstrates both technical and methodological sophistication. My primary recommendations are to consolidate the statistical analyses through multinomial logistic regression, simplify and clarify result presentation by replacing numerous figures with a comprehensive table, and provide additional methodological detail in the qualitative section. Incorporating citations for software and methodological sources, along with sharing the analytic code, would further enhance the transparency, rigor, and overall contribution of the paper. Reviewer #2: This paper reports findings from an impressively large study of undergraduate biology students interpreting graphs, focussing on their understanding of error bars and data variability. The results are interesting. There are some reasonably expected results, showing for example how students who generate plots with error bars have better understanding of them, people tending to find the type of plot they make as easier to understand, and reasonable shifts in descriptions based on whether they made raw bar graphs or those with error bars. Counterintuitive results included no apparent increase in understanding or confidence with increasing year of study. This work highlights important features of teaching plotting data to biology students and importance of conveying meaning and use of error bars. Overall the study is robust and well done, and the report is clearly written. There are a few areas that need addressing: Major comments: 1. There is a figure missing. The legend for Fig 9 matches what is provided as Fig 8; there is no figure provided that matches the description of Fig 8. Please provide the missing figure. 2. Throughout, it is noted if things are “significant” but the methods do not provide information about the significance criteria applied. Please add a line to the “Quantitative data analysis” section indicating what criteria (e.g., P<0.05, P<0.01) is used. Additionally, it is unclear if any correction for multiple testing is applied before evaluating significance from the chi-square/Fisher’s or Kruskal Wallis tests. Please indicate if any correction to the p-value or significant criteria were applied to correct for multiple tests and justify what is considered together in one unit for the ‘multiple’; if none were applied, please justify this. 3. Please provide information in the methods as to what you assessed as the correct answer to the “easy to interpret”/“hard to interpret”/“not shown” question, and explanation for how you determined what is easy vs hard (it looks to me like either error bars or individual values provided in an ordered matter made it “easy”). 4. I worry that some of the axis labels and in-figure legends will be unreadably small in a PDF/print version. Only Fig 1 and 4 look reasonably sized – please increase text size in others. 5. Line 848: it appears the legend for Fig 5 is incomplete. Please add the information alluded to about the number of each graph type the results are based on. Also please explain the terms used to describe each type of graph generated by students, e.g., Bar(raw) = value bar chart, etc, (in particular I’m not sure what two cat ones are? I’m thinking ‘bar and whiskers’ box plot types, but please explain.) 6. Lines 885-886: I am struggling to match up the letters A-F with significant post-hoc comparisons – please clarify what each letter refers to for the plots. Minor comments: 1. Lines 444-446: This sentence in the abstract, “When responses were linked with a graph the student made, those who created a bar graph with raw data showed lower understanding of error bars compared to students who themselves created a bar graph with aggregated data and error bars.”, is slightly confusing without context as it is not clear what ‘raw’ vs ‘aggregated’ data means. I suggest something a little more like “a bar graph showing every data point as a separate bar” vs “a bar graph showing aggregated means and error bars” 2. Lines 471 & 472: something is cited for 10 competencies, but they are not named, so the reader has to back-think to this sentence when “plan competency” and “analyze competency” is mentioned. I suggest either naming the 10, or say something like “including plan and analyze”. 3. Line 480: no comma before (2020) 4. Lines 501-502: I’m not sure ‘value’ instead of a ‘mean’ is that clear an explanation of a value bar chart, since a mean is also a type of value. Potentially something like “an individual data point’s value” would be clearer. 5. Lines 570-571: misplaced newline after “whether” 6. Line 578: no apostrophe in “question’s” (should be “questions”) 7. Line 601 (Table 1 legend), 668 (Fig 3 legend), 800 (Fig 4 legend): these all mention “the first time” for the students taking the assessment, however, it is already specified in methods that all the analysis is about the first time students take this – leaving this in the legends implies you are going to analyse more, so I suggest removing it from these legends. 8. Line 671: elsewhere it says “Error Terms” instead of “Form of Error”, so I suggest using “Error Terms” here as well, or just “Error” 9. Lines 676-765: please refer to appropriate sections of Figure 3 when reporting the relevant data 10. Line 985: there is an extraneous closing parenthesis at the end of this line. 11. Line 990: no comma before (2009) 12. Line 1010: no comma before (2005) 13. Line 1028: “thinking of the sample size as something that should be aggregated” is a little confusing – it’s not the sample size that is being aggregated but the individual values for each sample in the sample size. Perhaps something more like “thinking of aggregating across all data points within a sample” ****** what does this mean? ). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy Reviewer #1: Yes: Mark A. Perkins Reviewer #2: No ******** [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] To ensure your figures meet our technical requirements, please review our figure guidelines: https://journals.plos.org/plosone/s/figures You may also use PLOS’s free figure tool, NAAS, to help you prepare publication quality figures: https://journals.plos.org/plosone/s/figures#loc-tools-for-figure-preparation. NAAS will assess whether your figures meet our technical requirements by comparing each figure against our figure specifications. https://doi.org/10.1371/journal.pone.0343301.r001
Revision 1
20 Jan 2026 Author Response Due to inclusion of potentially identifying data, the excel files were removed from the file inventory. We removed the “CourseID” column as that was the only column I could see that they were saying was identifying from FirstClassVariabilityData file. From the variability by graphs file, removed the timestamp and courseID columns. Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Partly Reviewer #2: Yes ________________________________________ 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: No Reviewer #2: Yes ________________________________________ 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ________________________________________ 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ________________________________________ 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: 1. The study presents the results of original research This study appears to present the results of original research. The topic is timely, and the authors demonstrate technical proficiency in both quantitative and qualitative methods. 2. Results have not been published elsewhere It does not appear that the results have been reported elsewhere. The work seems to be novel and contributes original findings to the field. 3. Experiments, statistics, and analyses a. Statistical Analysis The study employs over twenty chi-square tests on a large sample and produces more than twenty plots to illustrate category-level percentages. I recommend that the authors consider a more efficient and integrative approach. Rather than conducting a series of separate chi-square tests with a common dependent variable (for example, “Made Raw Bar Graphs”), the authors could model all outcomes simultaneously using multinomial logistic regression. If the dependent variable were Type of Graph Created (e.g., “Raw Bar,” “Error Bar,” “Mean Graph”), the predictors such as interpreting raw bar graphs or interpreting error bar graphs could be entered into a single multinomial logistic regression model. This would allow the researchers to estimate the probability of each outcome while controlling for all predictors in a single model rather than running multiple independent tests. There are several limitations to running multiple chi-square tests. Large sample sizes make chi-square tests prone to significance even when the practical differences are negligible. Moreover, conducting multiple binary tests with one independent and one dependent variable increases the likelihood of a Type I error or false positive. In contrast, multinomial logistic regression offers several advantages. It reduces familywise error and thus decreases the likelihood of Type I error. It provides a more efficient analytic framework by integrating predictors into one model, and it produces interpretable coefficients and odds ratios that facilitate comparison across predictors. Finally, it supports concise tabular presentation of results, which is considerably more reader-friendly than numerous bar plots. Overall, this approach would yield more coherent and statistically robust conclusions that are well supported by the data. I encourage the authors to consider this method using the multinom() function from the nnet package in R, which performs multinomial logistic regression. This approach allows multiple categorical outcomes to be modeled simultaneously rather than conducting a series of separate chi-square tests. The multinom() function efficiently estimates the probability of each outcome category as a function of one or more predictors, offering odds ratios and confidence intervals that enhance interpretability and parsimony. For example, if the dependent variable were Graph_Type (e.g., "raw", "error_bar", "mean") and the independent variables were Interpreted_Error_Bar, Interpreted_Raw_Bar, and Interpreted_Mean_Bar, a simple model could be specified as: library(nnet) model <- multinom(Graph_Type ~ Interpreted_Error_Bar + Interpreted_Raw_Bar + Interpreted_Mean_Bar, data = your_data) summary(model) This example is generic. I encourage the authors to consider whether this would work and research it and to feel free to dispute this recommendation in their response if it does not work, explaining why. Be sure to include the regression tables in your manuscript. These would include the coefficients for each predictor relative to the reference category of the dependent variable, the standard errors, z-values, p-values, and odds ratios with 95% confidence intervals. It can also be helpful to include predicted probabilities for each outcome category to illustrate the practical interpretation of the model. Organizing the table so that each dependent variable category is clearly labeled and showing all relevant predictors side by side will make it easier for readers to understand the results without having to refer to multiple figures. While we appreciate the reviewer’s comments in suggesting using a multinomial logistic regression, upon doing some research and using the code the reviewer provided, we have decided to stay with our chi square approach. The multinomial logistic regression pulled out one of the independent variables and compared the other independent variables to that one. This approach would have meant to get the full model, we would have needed to run multiple multinomial logistic regressions designating a different reference variable. We also still would have needed multiple regressions for each of the different questions we asked, which was a similar vein to the chi-square test. We have applied a post hoc analysis (Bonferroni correction) to the chi-square tests to reduce the probability of a Type I error, which caused some minor changes to our results. In the supplemental material we will provide the critical value and residual table for each of the chi-square tests. b. Multiple Figures vs. One Table As an R user, I genuinely appreciate the researchers’ figures. They demonstrate a strong command of R’s advanced graphing capabilities and reflect thoughtful attention to data visualization. However, I recommend reconsidering the presentation format. The inclusion of more than twenty figures, each depicting a subset of categorical comparisons, makes the results difficult to follow and interpret as a coherent whole. A more effective alternative would be to replace the numerous figures with a single comprehensive table that displays the counts and percentages for all relevant variable combinations, with clear labeling to indicate which variables or categories each row represents. While we appreciate the reviewer's suggestion, we think the data are easier to visualize within figures and not within a comprehensive table. Several of the figures have been reformatted for better readability and we thank the reviewer for encouraging us to reflect on how to best present our data. A well-designed table would allow readers to view all comparisons at once, identify patterns and relationships across variables more efficiently, reference specific values precisely, and cross-validate their interpretation against the statistical results, such as those produced by a multinomial logistic regression. If the authors wish to retain visualization for accessibility and readability, they might consider supplementing the table with one or two summary figures that highlight key findings or effect sizes, such as predicted probabilities derived from the regression model. This would balance visual appeal with analytic clarity. The table itself should be self-contained, including variable names, category labels, sample sizes, and proportions, and it should mirror the structure of the statistical model. Each dependent variable category could be presented as a block, with independent variables shown alongside the relevant counts, percentages, and model-derived statistics such as odds ratios, confidence intervals, and p-values. Adopting this approach would substantially enhance clarity, interpretability, and reproducibility while preserving the authors’ evident technical skill in R. c. Qualitative Portion The qualitative section of the study is interesting, particularly in its use of coded data as numeric variables for some of the chi-square tests. There are, however, several areas that could be strengthened. First, I recommend consulting and citing Saldaña’s Coding Manual for Qualitative Researchers (2025 edition). This resource distinguishes between “thematic coding” and “theming the data,” which may help the authors refine their methodological description. Second, greater clarity is needed regarding the term “inter-rater reliability.” The text suggests that this might refer instead to inter-coder agreement, where a given percentage of codes overlap between coders. Inter-rater reliability typically refers to consistency among raters using a psychometric instrument such as a rubric, whereas inter-coder agreement assesses similarity in qualitative coding. The authors should clarify their intended meaning and provide appropriate citations. Finally, more detail on the coding process would strengthen this section. The authors mention both inductive coding and the use of a codebook, which suggests a hybrid or sequential approach. Inductive coding generally implies open coding without predefined categories, while a codebook implies a deductive or a priori framework. Clarifying whether the codebook was generated after an initial inductive phase and subsequently applied deductively would help readers better understand the analytic process. Thank you for the feedback and suggestions for strengthening this section of the manuscript. We appreciate the clarity that your suggestions will provide. We have referenced Saldaña and others to describe our qualitative data analysis approach but have added more detail. As noted by the reviewer, inter-coder agreement is aligned with our approach and incorporation of qualitative data to answer our research questions and complement the quantitative analysis. We have revised that section of the manuscript to reflect that. We have additionally revised our description of the step-wise process we applied with respect to how the codebook was generated and used to analyze our qualitative data. d. Citations of Methods I also recommend citing and referencing all R packages and software used in the analysis, as well as any qualitative sources that informed the methodology. Within the R community, it is considered both best practice and professional courtesy to credit package developers, particularly in publications such as PLOS that value open science and reproducibility. Including these citations would also enhance the transparency and replicability of the study and will help the developers of the code. We appreciate the reminder from the reviewer to cite R and the FSA package. These citations have been added to the manuscript. 4. Research Ethics I believe strong research ethics and publication standards were followed in this study. The research appears to meet all applicable standards for the ethics of experimentation and research integrity. 5. Reporting and Data Availability The article generally adheres to appropriate reporting guidelines and community standards of data availability. Sharing the R code used in the analysis would further strengthen transparency and reproducibility and would help readers understand how the models were derived from the multiple-tab Excel file. We will provide the R code used in these analyses with the supplemental information. 6. Final Thoughts Overall, this is a promising and well-conceived study that demonstrates both technical and methodological sophistication. My primary recommendations are to consolidate the statistical analyses through multinomial logistic regression, simplify and clarify result presentation by replacing numerous figures with a comprehensive table, and provide additional methodological detail in the qualitative section. Incorporating citations for software and methodological sources, along with sharing the analytic code, would further enhance the transparency, rigor, and overall contribution of the paper. Reviewer #2: This paper reports findings from an impressively large study of undergraduate biology students interpreting graphs, focusing on their understanding of error bars and data variability. The results are interesting. There are some reasonably expected results, showing for example how students who generate plots with error bars have better understanding of them, people tending to find the type of plot they make as easier to understand, and reasonable shifts in descriptions based on whether they made raw bar graphs or those with error bars. Counterintuitive results included no apparent increase in understanding or confidence with increasing year of study. This work highlights important features of teaching plotting data to biology students and importance of conveying meaning and use of error bars. Overall the study is robust and well done, and the report is clearly written. There are a few areas that need addressing: Major comments: 1. There is a figure missing. The legend for Fig 9 matches what is provided as Fig 8; there is no figure provided that matches the description of Fig 8. Please provide the missing figure. We appreciate the reviewer pointing this out, Figure 9 is now included with the manuscript 2. Throughout, it is noted if things are “significant” but the methods do not provide information about the significance criteria applied. Please add a line to the “Quantitative data analysis” section indicating what criteria (e.g., P<0.05, P<0.01) is used. Additionally, it is unclear if any correction for multiple testing is applied before evaluating significance from the chi-square/Fisher’s or Kruskal Wallis tests. Please indicate if any correction to the p-value or significant criteria were applied to correct for multiple tests and justify what is considered together in one unit for the ‘multiple’; if none were applied, please justify this. We have noted that the significance threshold is 0.05 on line 642. Additionally, we have noted that a Bonferroni correction was used with all analyses to reduce the probability of Type I error on line 652. 3. Please provide information in the methods as to what you assessed as the correct answer to the “easy to interpret”/“hard to interpret”/“not shown” question, and explanation for how you determined what is easy vs hard (it looks to me like either error bars or individual values prov Attachments Attachment Submitted filename: Response to Reviewers.docx https://doi.org/10.1371/journal.pone.0343301.r002
5 Feb 2026 Decision Letter - David R Wessner, Editor <p>Revealing undergraduate biology students' conception of variability and error bars within graphing PONE-D-25-48061R1 Dear Dr. Stoczynski, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager® and clicking the ‘Update My Information' link at the top of the page. For questions related to billing, please contact billing support . If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, David R Wessner, Ph.D. Academic Editor PLOS One Additional Editor Comments (optional): Thank you for submitting this revised manuscript and for thoroughly and thoughtfully considering the suggests from the reviewers. Although both reviewers had several questions about the original manuscript, they also both noted that it described an impressive body of work and should contribute meaningfully to the literature. With the changes you made in response to their comments, the revised manuscript is much stronger. It should be of general interest to STEM educators. Thanks for submitting this work to PLOS One. Reviewers' comments: https://doi.org/10.1371/journal.pone.0343301.r003
Formally Accepted
Acceptance Letter - David R Wessner, Editor PONE-D-25-48061R1 PLOS One Dear Dr. Stoczynski, I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS One. Congratulations! Your manuscript is now being handed over to our production team. At this stage, our production department will prepare your paper for publication. This includes ensuring the following: * All references, tables, and figures are properly cited * All relevant supporting information is included in the manuscript submission, * There are no issues that prevent the paper from being properly typeset You will receive further instructions from the production team, including instructions on how to review your proof when it is ready. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few days to review your paper and let you know the next and final steps. Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. You will receive an invoice from PLOS for your publication fee after your manuscript has reached the completed accept phase. If you receive an email requesting payment before acceptance or for any other service, this may be a phishing scheme. Learn how to identify phishing emails and protect your accounts at https://explore.plos.org/phishing. If we can help with anything else, please email us at customercare@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. David R Wessner Academic Editor PLOS One https://doi.org/10.1371/journal.pone.0343301.r004

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .