Peer Review History
Original SubmissionOctober 23, 2019 |
---|
PONE-D-19-29580 Citizen science and machine learning for predicting spatio-temporal patterns in seabird migration PLOS ONE Dear Dr Martin, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. i apologize for the long handling time, but securing reviewers proved very difficult. Because I was only able to secure one review, I did another review of the ms, and I concluded that it needed substantial revisions before it could be published. I forward an annotated version with inserted comments, edits, and questions. In particular, I urge the authors to: 1) revise the introduction to remove redundancies / and focus focus on the main goals. 2) revise the introduction and discussion to ensure the limitations of citizen science data are acknowledged\\. currently, the ms highlights the limitations of tracking and vessel-based surveys, but there are many similar limitations to citizen science data: uncertainty of identification, uncertainty of counts, uneven coverage and lack of systematic effort, and lack of knowledge of the colony provenance of the birds sighted at sea. 3) revise the methods section (and augment with supplementary tables) to provide readers with a sense of how the sighting / fishery / environmental datasets were cleaned / processed. in particular, address the uncertainties associated with the species identifications, and explicitly explain how the fishery data were subsetted by sector / fishery / target. 4) furthermore, please provide evidence that the authors had cleared the terms of use of these publicly available sighting data, and are thus able to publish these datasets. moreover, explicitly state how many records / birds were included in each dataset, for the pre and post breeding migration. finally, explicitly state whether "zero" counts were included, if standardized surveys (or sightings of other species were reported, but no Balearic Shearwatyer sightings were reported). 5) the methods also need to be strengthened, since the ms refers to several exploratory analyses (daily abundances used to censor the study period and co-linearity of explanatory variables), but no results are provided. I urge the authors to provide supplementary tables with these results, so the readers can evaluate what was done. 6) the results also need to be strengthened, to ensure tests are clearly reported and evaluated. in particular, please use the comments added to the ms to address the comparison of the model R-squared values, and the correlations or modeled / observed abundances. finally, please justify the small number of iterations used (25) and the use of means +-/ 95% CI for some of the estimated parameters, rather than medians (25% - 75%). 7) The discussion and conclusions were clear and concise, but they are difficult to evaluate , given the paucity of details in the methods / results. Finally, I urge the authors to carefully consider how they identify "key" areas, and to consider that they are only using counts; there are no behavioral data (eg., feeding versus transiting). Namely, would a high flux area (e.g., Gibraltar Strait) with lower counts be more sensitive to impacts than a foraging area with higher counts but lower flux? We would appreciate receiving your revised manuscript by Feb 22 2020 11:59PM. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols Please include the following items when submitting your revised manuscript:
Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. We look forward to receiving your revised manuscript. Kind regards, David Hyrenbach, Ph.D. Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at http://www.plosone.org/attachments/PLOSOne_formatting_sample_main_body.pdf and http://www.plosone.org/attachments/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. Thank you for stating in your Funding Statement: 'This study is part of a research project (“Environmental factors determining the interannual variation in the migration of Balearic and Scopoli’s shearwaters in the Mediterranean”, 2018-2019) which has been partly financed by the Annual Programme of Grants of the Instituto de Estudios Ceutíes (IEC, Autonomous City of Ceuta, Spain), years 2018-2019.' a. Please provide an amended statement that declares *all* the funding or sources of support (whether external or internal to your organization) received during this study, as detailed online in our guide for authors at http://journals.plos.org/plosone/s/submit-now. Please also include the statement “There was no additional external funding received for this study.” in your updated Funding Statement. b. Please include your amended Funding Statement within your cover letter. We will change the online submission form on your behalf. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: In this manuscript, authors present a novel work to evaluate whether data from a long-term monitoring programme carried out by volunteers, i.e., a citizen science project, can be used to assess bird migratory patterns. They focused on an endangered seabird species endemic to the Mediterranean Sea that migrates to the Atlantic out of the breeding period. Authors relied on supervised machine learning technics to predict bird abundance, building up the models from data gathered at a network of bird census stations. Specifically, they used random forest regression models to evaluate the usefulness of citizen science data in predicting spatial and temporal abundance based on a set of predictors, including many satellite-derived features, reaching an accuracy of about 70%. Authors concluded that combining long-term citizen science with predictive modelling can be a reliable tool to assist long-term monitoring of sensitive species, to identify important areas and/or to detect trends. The paper is well structured. The study is accurate and provides an interesting approach. As general comments, the major strength of this study relies on the source data used. Trektellen is an online database recording opportunistic seabird counts carried out by volunteers across Europe. Despite its potential, very few papers have made use of this long-term database. As a counterpart, there are also some weaknesses in the work that should be amended for publication. My main concern in this regard is that authors tend to overstate along the manuscript the importance of random forest (RF) to predict spatial and temporal abundance. RF is a really powerful algorithm (I personally really love it) based on an ensemble of decision trees, but its use is not new in the context of species distribution/abundance modelling (e.g. Oppel et al. (2012) Comparison of five modelling techniques to predict the spatial distribution and abundance of seabirds. Biological Conservation, 156, 94-104). In fact, recent approaches for species distribution modelling tend to build up a final output based on an ensemble of several algorithms (a.k.a. Ensemble Ecological Niche Models), including RF among them. See the “ssdm” package in R for details (Schmitt et al. (2017) ssdm: An r package to predict distribution of species richness and composition based on stacked species distribution models. Methods in Ecology and Evolution, 8(12), 1795-1803; and also, e.g., Pereira et al. (2018) Using a multi-model ensemble forecasting approach to identify key marine protected areas for seabirds in the Portuguese coast. Ocean & Coastal Management, 153, 98-107.). Apart from that, a major weakness of RF, just as in decision trees, is that they cannot extrapolate outside the range of the training data, and therefore great care must be taken when these algorithms are used to predict trends in the long-term, as those expected in the context of climate change. For these reasons, I think authors should attenuate the importance of using this machine learning algorithm in the paper, beginning with removing it from the title. Below I will refer to specific section and lines to provide comment: Title As I have already commented, I would recommend changing the title, omitting reference to machine learning on it. Abstract: There is a lack of introduction. Please include at least one or two sentences about the general framework before stating what you did address. Introduction: In general, I think the introduction needs a bit of work. Some sentences are not explained well enough. The first paragraph is not very cohesive. L49: Should be rephrased. Although I understand what authors mean, these concepts could be explained in a clearer way. L52-53: I would say seabirds are threatened because they are exposed to multiple threats -most of them related to human activities- as they cross and use multiple habitats year-round, not because they are subjected to many different environmental conditions (to which they are -or should be- adapted). Please rephrase. L53: I do not understand why you use “however”. I understand that phrase is taking about something completely different from the previous ones. L54: Reference 3 does not support what is said. L60: Reference 7 is about passerines. You should provide more references regarding the importance of stop-overs in species of marine or coastal habits. L62-66. I do not agree with what is stated here. Determining the spatial distribution of seabirds during the migration period is a difficult task because birds they are generally located beyond the reach of the human eye. It is not obvious to me what you mean when saying “surveys of birds are usually limited to specific subsets of the total of the total migratory route”.L64: your statement is false for seabirds. Ring recoveries are not useful for identifying general movement patterns in seabirds, not only stop-over areas. L68: accelerometers are not intended to track animal movements but behaviour, so they do not contribute to know at-sea distribution of seabirds. L69: I do not agree. Light-level geolocators are the most extensively used devices to track migratory movements of seabirds, and the cost is not a general problem with them. Indeed, geolocators have the advantages of lilted weight, low price and animal-welfare, easy attachment, so high sample sizes are affordable. The real problem is what you explain in the next phrase. Even good sample sizes could be unrepresentative of seabird colony size (Soanes et al. (2013) How many seabirds do we need to track to define home‐range area? Journal of Applied Ecology, 50(3), 671-679; Thaxter, et al. (2017) Sample size required to characterize area use of tracked seabirds. The Journal of Wildlife Management, 81(6), 1098-1109). L75: references are in wrong format. Moreover, both references do not refer to seabirds, even though you are already focused on seabirds since the first paragraph. Please use more appropriate references (i.e. regarding seabirds). L76: format of references is wrong. L77-78: Electronic devices are intended to provide individual tracking and thus get information at individual level. Taking this in mind, they can allow indeed long-term monitoring, and individuals of different species have already been tracked for more than a decade. In the case of seabirds and geolocators, devices can also be replaced annually, so as long as individuals are alive and can be recaptured, long term tracking can be carried out. Last, a major advantage of individual tracking is the fact that you know individual origin (i.e. colony, population). So, the weakness you state for tracking devices is inconsistent. Moreover, the weakness you argue about electronic devices also applied to census surveys, as birds counted at coastal points or vessel surveys are of unknown origin, and even species identification could be difficult for some species (as Balearic and Mediterranean shearwater). Instead of looking for weakness of tracking methods, you should highlight the strength of census methods (e.g. take an overview picture despite unknowing ages or colonies of origin). You should also comment a major weakness of census method for seabirds, which is the impossibility to count birds when they use areas out of the human sight during wintering or migration. L84-91: I think here you did well in relating the strength of citizen science projects- I would suggest you to combine these lines with the previous paragraph, once removed/resolved the issues I said before. May this reference is also of interest for citation: Coxen, C. L., Frey, J. K., Carleton, S. A., & Collins, D. P. (2017) Species distribution models for a migratory bird based on citizen science and satellite tracking data. Global ecology and conservation, 11, 298-311. Last, you should introduce Trektellen here and comment if some other paper has use this datasource before, as I see this is a really good point of your paper. L92-96: These lines may be in a different paragraph and be complemented with information about techniques for species distribution modelling. As I already said, many algorithms are available to model and predict species distribution and abundance, and some methods even integrated multiple algorithm. Thus, there is room here to give readers a bit introduction about this issue. I would include information on satellite-derived features as second part in the paragraph, as such features are the predictors used in the models. L100: there is a typo after “shearwaters”. L108-109: Please provide a reference. Which unused areas? L114: replace “useful data” by “useful source of data” L116: replace “these datasets” by “citizen science data” L118: “most likely pelagic habitats”. This is inconsistent. You said in L112 the species occurs mainly in shallow coastal waters. Also, observation from coast only frequently allow to count birds in coastal waters. So identify pelagic habitats used during migration seems unfeasible. L120: I believe the aim was to evaluate the performance, not to determine it L121: replace “seabirds” by “Balearic shearwaters”. L122: remove “of these and other species”. Also, you could include at the end something like “and discuss the potential of citizen science data for seabird conservation”. L134-136: I suggest you include this in the next section. L147: Ref. 32 is related to a different species, remove it. L148: you should cite this paper, as one of the first describing environmental features to predict the presence of Balearic shearwaters: Louzao, M.et al. (2006) Oceanographic habitat of an endangered Mediterranean procellariiform: implications for marine protected areas. Ecological applications, 16(5), 1683-1695. L33: this reference is from a symposium in 1993. Please, try to provide more appropriate references. Lot of really good work has been done in the last 20 years with Balearic shearwaters, even addressing the specific question you mention in L10-151. Just two examples: • Jones, A. R., Wynn, R. B., Yésou, P., Thébault, L., Collins, P., Suberg, L., ... & Brereton, T. M. (2014). Using integrated land-and boat-based surveys to inform conservation of the Critically Endangered Balearic shearwater. Endangered Species Research, 25(1), 1-18. • Wynn, R. B., Josey, S. A., Martin, A. P., Johns, D. G., & Yésou, P. (2007). Climate-driven range expansion of a critically endangered top predator in northeast Atlantic waters. Biology Letters, 3(5), 529-532. L151: replace “pelagic and demersal” by “pelagic but also demersal”. L151: You start the phrase referring to Balearic shearwater’s diet. I guess where you say “but it also feeds” are referring to the species, not the diet. Let’s say: “Balearic shearwater’s diet includes small pelagic but also demersal fish, frequently obtained from trawling discards. The species can eventually feed on plankton and macrozooplankton, specifically krill”. Material & Methods. L166: I recommend changing “observatories” by sighting points for clarity. L171-172: May 1st – August 30th. L179: Ref. 39 appears two times L187: I find necessary to provide also a reference on shearwaters, at least a closely related species. Dias, M. P., Granadeiro, J. P., & Catry, P. (2012). Do seabirds differ from other migrants in their travel arrangements? On route strategies of Cory’s shearwater during its trans-equatorial journey. PLoS One, 7(11), e49376. L187: you say “fluctuations in migration are closely related to changes in food resource”. What are you referring to? Do you mean bird abundance during migration? Do you mean fluctuations in areas used (location) during migration? Do you mean fluctuations in dates? Please clarify. L192-193. I find this phrase unnecessary, and the reference provided not appropriate. Papers on seabird research usually indicate. I would suggest rephrasing to something like “We used chlorophyll concentration (Chla, measured in mg.m-3) as a proxy of marine productivity [ref]. We download satellite-based monthly products at 4 km spatial resolution for 2003-2017 period”. Ref. is Wakefield, E. D., Phillips, R. A., & Matthiopoulos, J. (2009). Quantifying habitat use and preferences of pelagic seabirds using individual movement data: a review. Marine Ecology Progress Series, 391, 165-182. L195-198: This is an important methodological issue. I am going to use this moment to comment my concern about the unbalance in temporal representativity of predictors. You only use/have two years of fishing vessels distribution…I wonder how the unevenly representativity of predictor can affect Random Forest accuracy and predictions…. L202: I find necessary to cite at the end of the phrase “…modulating migratory behaviour” the reference: González-Solís, J., Felicísimo, A., Fox, J. W., Afanasyev, V., Kolbeinsson, Y., & Muñoz, J. (2009). Influence of sea surface winds on shearwater migration detours. Marine Ecology Progress Series, 391, 221-230. L203: First reference in incorrect format, and also I find this reference no proper here. But again you have many to pick up one: • Catry, P., Dias, M. P., Phillips, R. A., & Granadeiro, J. P. (2011). Different means to the same end: long-distance migrant seabirds from two colonies differ in behaviour, despite common wintering grounds. PLoS One, 6(10), e26079. • Dias, M. P., Granadeiro, J. P., & Catry, P. (2013). Individual variability in the migratory path and stopovers of a long-distance pelagic migrant. Animal Behaviour, 86(2), 359-364. • Dias, M. P., Granadeiro, J. P., & Catry, P. (2012). Do seabirds differ from other migrants in their travel arrangements? On route strategies of Cory’s shearwater during its trans-equatorial journey. PLoS One, 7(11), e49376. But you can also cite this one that found out quite different results with regard to what you were saying: • Dell’Ariccia, G., Benhamou, S., Dias, M. P., Granadeiro, J. P., Sudre, J., Catry, P., & Bonadonna, F. (2018). Flexible migratory choices of Cory’s shearwaters are not driven by shifts in prevailing air currents. Scientific reports, 8(1), 3376. L203: Update [29] in Bibliography as this paper is already published. I find that text in pages 10-11 could be reduced and summarize drastically to ease reading….Yo may included details in Supplementary Material. I think you give too much details about predictors used but could be omitted simply citing references were each predictor has been used before. Regarding Table 1: Indicate in the “Total” row that it is the response/target variable. Batim: remove year Since the work is based on citizen science, I would expect to find “day of the week” in the table of predictors, as volunteers usually go birding on weekend. Another important detail to me is that first paragraph of Results section should be moved to Material and Methods, as an exploratory analysis is part of methods and drives your next methodological steps. Idem for second paragraph in Results. Another point to highlight in this point is that you do not show any data regarding sampling effort of the citizen science project. The consistency in these projects is key for data quality and even though RF can deal with this issue, it would be informative to see a plot with sampling effort (for example days of observation) at the 123 observatories over time. Statistical analysis I recommend starting this section clearly indicating the structure of your data matrix, i.e. clearly defining whether each row is an observatory point, a session (i.e. specific day) from an observatory point, or whatever. L60: I would recommend citing references already published. If it is not possible, at least provide some extended explanation to understand your choice without the need to find a not yet published paper. L243: It is not clear to me why you log-transformed the abundance data. Please clarify it. L246-256: You should provide more details about tuning parameters of RF. Please clearly state the total sample size in the data matrix used as input in the models (number of rows, number of columns), the number of trees you grew up, the number of predictors you used for each tree, and the number of samples you select in each tree. It is not clear if you are using cross-validation, please try to clarify your explanations in L253. L249: You refer RMSE without defining its meaning before. Please do explain briefly to the readers why you used Root Men Square Error for parameter tuning. I would expected to see the confusion matrix (predicted vs observed) to generally evaluate accuracy of RF models. Could you include it? L257: A bit confusing...May be rephrase: “Relative importance of the variables used for predicting shearwater abundance…” Regarding Variable Importance, considering most readers of this paper will be ecologists from outside computing science, I would recommend to use a metric (and plots) easier to follow and understand. Ranking variables with the Gini Index or Mean Decrease Accuracy and showing a barplot illustrating the ranking would be much more understandable than plots provided by randomForestExplainer. In my opinion, when predictors overlap in the ranking the visualization provided by this R package is not so effective, as shown in figures 3a and 3b. Results As I said before, I think the first and the second paragraph of Results section should be moved to Material and Methods, as an exploratory analysis is part of methods and drives your next methodological steps. L276-277: The way this sentence is written is somehow confusing, as you only run two models (pre & post-breeding migration), but from the text it seems as you run a lot. As building a RF model implies using different training data subsample, I recommend removing this from the sentences “among the models built with different training data subsamples”. L308-310. This result could be due to a temporal bias in sampling effort from observatories. You should show, at least, the number of days of observation per Latitude and Year to support this result. Fig. 5, figure caption: I think it is more properly as: “Predictions of Balearic shearwater spatial distribution and abundance during migration” Discussion L349-352: I understand Big Data is trending topic…But this paper is really far from using Big Data, so I recommend to avoid the use of this term. L358: My suggestion: “allow to identify general spatio-temporal patterns as data come from different individuals belonging to several populations”. You should discuss in this part the weakness of data coming from volunteering programmes, such as temporal gaps, uneven sampling effort in space and time, etc. particularly in L353 where you indicate “its characteristic noise” but do not explain anything more. L361-365: This phrase is too long, 5 lines…. Please rephrase or split it. L368: RF are not good to forecast out of the range of input data, so future projections based on this could be unreliable… depending on input data. Moreover, I think this issue (I mean future projections) is out of the scope of this paper. L371: ref [74] is about ducks. I would say there are more appropriates references to cite here related with seabirds. L379: format of references incorrect. L382: Be consistent with therm used. Replace volunteer data by scitizen science data. L393: what do you mean to “larger variability observed”? which result support this? L395-397: This phrase is not clear. Please rephrase it. L422: this is related to some clarifications needed that I mentioned before. L433-448. I found Conclusions section clear and nice Bibliography: There are typos spread over the section. ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please note that Supporting Information files do not need this step.
|
Revision 1 |
PONE-D-19-29580R1 Citizen science for predicting spatio-temporal patterns in seabird abundance during migration PLOS ONE Dear Dr Martin, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. We would appreciate receiving your revised manuscript by Jun 07 2020 11:59PM. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols Please include the following items when submitting your revised manuscript:
Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. We look forward to receiving your revised manuscript. Kind regards, Vitor Hugo Rodrigues Paiva Academic Editor PLOS ONE [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: All comments have been addressed Reviewer #2: (No Response) ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Partly ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: No ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: No ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: I congratulate the authors for changes accounted over the manuscript, which is now much easier to follow, readable and understandable than the previous version. In my opinion they resolved successfully most of the comments done in the first submission. Nevertheless I have some comments on this new version. Despite most of them are minor comments, I really do think authors should amend them to improve the manuscript for publication. My comments are detailed below. Line numbers correspond to those of the version with “Tracking changes” activated. L36: I suggest "Random Forest regression models" L54: ecosystemS, “s” in plural. L65: have been received (remove “been”) L74: I suggest replace “limit” by other word such as “undermine”, “affect”. L90: I suggest: species migratory range due to the variety of migratory strategies at colony- and individual levels. L98: replace “Therefore” by other term to avoid starting in the same way than previous sentence, it would make reading more fluid. L98-99: I suggest: even providing very detailed information on INDIVIDUAL SEABIRDS’ movements, have a limited ability… L101: My suggestion: Census methods, in contrast, can provide a VALUABLE overall picture, even though they do not allow to detect birds when they use areas out of human sight. L106-131. Just another suggestion to ease reading. You used “however” and “in this sense” repeatedly to start sentences. May be you can replace some of them for readability. L130: as you say “almost unexploited” I would expect to see at least one reference using Trektellen. If there are no publications, it may be better to say “have remained unexploited by the scientific community to the best of our knowledge” or something similar. L143: Replace “geolocator archival tags” by “geolocation archival tags”, which is the correct form. L167: may provide a useful records (delete “a”) L178-179: the correct name does not include “Individual”, just “Species Distribution models”. L263: How many sighting points from eBird? L275: Why “Environmental” with capital letter? L277: I suggest to put a comma before “thus” to ease reading. This advice also applies to many other "thus" along the manuscript. L288: Caption of Table 1. For consistency you should say “Description of environmental predictors.”. L393: Classification and Regression TreeS, add “s”. L414-417: Honestly, to the best of my knowledge there is no need to apply any kind of transformation on the response variable to run Random Forest models. Conversely to GLMM and other parametric methods, random forest do not have any kind of assumption about distribution of data or residuals, so that count data adjust to a Poisson distribution is irrelevant here. The same about the predictive power of the model, it will not change because of transforming the response variable. A different thing is to transform predictors applying standardization, but that is different and is not what was done here. To make sure about my previous knowledge, I did a little research trying to find support to your statement, with no luck. May be I am wrong, so I sincerely encourage the authors to provide a reference supporting that transforming the response variable improves accuracy and increases predictive power of Random Forest models. If you do not find such support, I think the log transformation could be simply justified to improve interpretability of data -which (to me) should be enough-, but not to improve accuracy and predictive power of Random Forest. L515: Regarding Fig. 4. Please (1) indicate in legend that values are in ln, or back transform them to the actual scale, (2) change year to factor so in the X axis years do not appear with decimal digits. L532: Caption of Figure 5. I think “spatial distribution of abundance” can be confusing for readers. My suggestion to the entire caption to ease readability: “Predicted abundance of Balearic shearwater across its distribution range during migration. Colour gradient indicates the percentage of the maximum predicted abundance across the study area on the fifteenth of the month, from May to December, from environmental conditions (see Table 1) occurred in year 2017." L553: “Our results showed (…) successfully applied to describe the migratory distribution (AND ABUNDANCE?) of seabird species accurately.” L579: increaseD, add "d" L587-596: “In contrast…(…) in the long-term.” Phrase too long, more than five lines…hard to read. Please split it. L636-637: “For instance, locations with tailwinds, high abundance level and low chlorophyll concentration”, reorder, as in the next sentence, to: For instance, locations with high abundance level, tailwinds, and low chlorophyll concentration”. L650: “But the major achievement of our modelling attempt is the high temporal resolution that we achieved with our models”. May be better to rephrase this phrase to avoid tongue twisters….for instance substituting “achievement” by accomplishment, and “modelling attempt” by analysis (in fact, as you already performed the modelling, it is no longer an attempt, isn’t it?). L676: “the two migration movements”. I suggest to change this by “the two migratory periods” for clarity, and for consistency (see your lines 266-267). L711: In relation to successful approaches developed to identify marine Important Bird Areas, instead of reference [69] I find much more appropriate and relevant to cite the work developed by SEO/BirdLife, a fortiori, in a work regarding Balearic Shearwater. Either the working-example paper https://www.sciencedirect.com/science/article/abs/pii/S0006320711004745 or the book Arcos, J. M. (2009). Áreas importantes para la conservación de las aves marinas en España: Important areas for the conservation of seabirds in Spain. Sociedad Española de Ornitologia (SEO/BirdLife). Reviewer #2: First of all, I’d like to congratulate the authors for the study. The study, definitely, is interesting and relevant as it is aiming at (1) a better understanding of the migration pathways of a highly threatened seabird species whose general knowledge present several gaps, and (2) deal with citizen-science large data banks. Such data banks are very useful and studies that aids in a better understanding of how to analytically deal with citizen science in order to provide biologically and ecologically sound information are always welcome. I have no doubt that this study deserves attention and have potential to make an important contribution on those two topics I mentioned above. Therefore, I did a careful reading of the manuscript and checked with attention the critics of the previous reviewers and the authors’ response. I know the reviewing process can be daunting sometimes, particularly when a first round of reviews have already been done, and then a new reviewer comes in and makes further critics that haven’t been there before. Unfortunately, I think that is going to be my role here. In my view, the main fragilities of this study are: the details of the analytical approach need to be better presented; presentation of results needs to be reviewed, several results would be better presented if figures were made differently and/or a different set of statistical outputs of the RF models were used; and, finally the most critical issue, I ask authors to reconsider using Lat and Long as factors in the modelling, based on figure 3, both Lat and Long are the variables that explained most of models’ power of prediction – what is the consequence of that to the findings? If authors think the use of LatLong is justified, they should explain it carefully in methods and discuss it more deeply. But for me, it is clear on Fig3 that LatLong reduced considerably the importance of the environmental variables and the model is mostly predicting spatial occurrences than habitat use. Below a list of the detailed comments that authors should address. I ask patience for the authors and hope they understand the comments and critics aims for improving the final manuscript. Have a good review, looking forward to read the reviewed version! L52 – Seabirds instead of “marine birds” L54 – I suggest deleting “migrate” from this sentence. Start next sentence with something like “migratory seabirds are particularly susceptible to a large number of stressors given the variety of habitats they use throughout their year-cycle…” L60- Suggest rephrasing to “Stopovers are key sites; conditions experienced by seabirds in stopover can affect individual survival throughout migration and drive population dynamics” or some similar idea. L76 – “and in relation to specific individual traits” this is a bit “loose” within the sentence. I suggest you describe which are those individual traits, or delete this part of the sentence. For instance, in the start of the sentence “This is particularly true for long-lived seabirds with a defined set of traits such as….. which have a great capacity… “ L96 – Rephrase “Prominent electronic citizen science data banks includes eBird and trekellen xxxx, two biodiversity…”. Additionaly, why they are prominent? Cited studies compared them with other databanks? Maybe more popular, commonly used, or else…? L104 - Birdlife factsheet indicates several parameters explaining why this shearwater is critically endangered and why actions to improve knowledge on population trends are urgent, but I did not find any reference to the species being one of the most threatened species of seabirds. Maybe rephrase emphasizing that information from [26] allows placing this species as one of the most threatened seabirds in the world, as possibly it is. L151. Is it possible to plot the position of known breeding colonies of the species? Line 170. This is key: correctly identified when sighted. Some seabirds at sea are very difficult to distinguish; therefore the use of data from citizen science for study difficult-to-identify species should be careful. I hope that it is acknowledged in the discussion. L183. Data from 2005 to 2017 was used because it is the most representative period in terms of sampling. 2018 was excluded because it does not have a homogeneous sampling throughout the year. Mention it here. L191. It seems you used year and geographical position as factors in your model (by seeing figure 3). It is not described in methods how those variables entered the model. I would not recommend using Lat and Long as factors in this case, and I really think it doesn’t help your study at all (if it does, please provide an explanation). Lat and Long are not environmental variables used by the birds, and given that both were the most influencing variables in the model, it is likely the models outputs are predicting the occurrences rather than the relation between abundance and environment. Don’t you agree? If this is true, the model should be run again without lat and long as factors. You could, a posteriori, use a probability of occurrence based on latlong to filter the predictions like Hindell et al. (2020; DOI 10.1038/s41586-020-2126-y) did. L193. In practical terms, zero was suppressed, right? L197 – “For a detailed description of the set of predictors included in the models, please see the Supplementary Material.” Instead of this, you could add in line 195 “…predictors (Table 1, Supp. Material).” The references in the supplemental material are the same from the main text? Shouldn’t sup. material have its own reference list? L211. Where exactly in results? Reference to supp. material. L212-221. I didn’t understand this section. You applied a single machine-learning based technique (random forest), right? You start the sentence saying it in plural. Then you presented a series of models and an equation, apparently presented in three studies [43,44,45]. If this is true, this paragraph could be simplified to saying that among a handful of techniques, RF was identified as one of the most accurate in at least three studies based on comparison of RMSE. The reader can check then the studies that you cited. L242 – 248. This is not clear and needs further explaining. You used bootstrap to calibrate the model, the algorithm stopped when the best “solution” was reached? That’s why you have a variable number of combinations (1-15)? It is also usual to run a fixed number of trees and check the increase in accuracy, and a posteriori, select the trees that produced increasing accuracy without substantial overfit. That’s why you had a variable number of trees between 100 to 500? I imagine you used 500 threes for all the possible variable combinations and used the approach I described to select the best solutions, the minimum number of trees required to achieve that was 100. Is it right? Or I completely misunderstood? The RMSE is calculated over the OOB-error? It did not seem that you did it. In my opinion, a plot showing OOB vs RMSE (or AUC, or other…) over all iteration trees could provide such information and justify why there is a variable number of trees used in the final and averaged abundance output (I briefly checked the packages you used, there are some ways of doing that: http://topepo.github.io/caret/model-training-and-tuning.html). If this is not what you did and I misunderstood, please provide alternative explanations. L272. Order of the figures was quite confusing… figure 1 was the last in the PDF… putting the legends with the figures instead of merging it in the text would facilitate reviewing. I hope authors consider it in further revisions or submissions. L273. You said in methods that you removed correlated variables, and in the end, you used all variables because no variable was correlated. Change it in methods. L277. Figure 2 points out a small variability on power of prediction based on R2 values, more or less between 0.49 and 0.53, and slightly more accurate predictions during RFPost, as mean error was lower. It is not clear to me how it indicated substantial variation among models. Please explain. Further information is also required: are those results from all the iterated trees? L279-281. Please explain what lagged-iterated differences mean and how they were calculated. Is this a lag between iterations or annual variability? Check next comment. L299. Year entered as a factor? Or models were run separately for each year? Methods were not clear on how annual variability was used in the models. If year entered as factor in the analysis, how did you deal with fisheries time-coverage being different then the species data? You probably used fishing as a fixed non-dynamic variable. How you justify that? L306. What is minimum depth? It is crucial to understand figure 3. An alternative way (more straightforward, in my opinion) of analyzing variable contribution is to plot the change in accuracy when the variable is absent from the model. Seems that Caret package has a standard function to do that: ‘varImp’. L312. Not sure whether figure 4 contributes to the overall results. It could be placed on the suppl. Material. How annual variability was used in the model is not clear either. L325. You said in methods you used two periods of migration, instead you present here eight different periods. Can you group information for only those two period? More detailed results could be placed in the Supp Material and in the main text a general figure highlighting the detected stop-overs on the two different periods. Breeding and non-breeding known areas in the figure also would be very useful. L325. Figure suggests part of the population remains in the Mediterranean year-round. Is that true or it is a product of the modelling? L348-351. It is important to highlight that boosted regression trees methods such as the used here can artificially inflate accuracy with increasing iterations, therefore the need to evaluate how fit and accuracy varies with iterations. It is possible to have increasing accuracy and loss of fit to the point that the model starts predicting the response itself (occurrence or abundance) disregarding the factors, therefore one should use this in order to select the optimum number of iterations to be used in the final model outputs. This is not clear in methods, and this is not discussed either. L386-390. Nowhere in your results there are estimated response curves. So please cite a reference to this statement. L389. It would be very useful to place in the map the sites you mention here, such as Alboran Sea or Gulf of Cadiz in figure 1, so readers not familiar with this region of the globe can be spatially situated. L392-394. I thought results indicated differences on migratory routes between periods. Did I misunderstand? L397. Yet, how temporal variability was used in the model is not clear. L417. Again, without estimated response curves, I don’t think it is possible to reach such conclusion. A variable having high importance in the modelling doesn’t mean the birds had higher abundance in the higher values of the variable, this is particularly true using a model as RF that does not necessarily assumes linearity. L423. How variability in accuracy (that was not as large as the authors claimed) was led by the link with food availability? It is not clear. Needs better explanation. ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: Yes: Lucas Krüger [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please note that Supporting Information files do not need this step. |
Revision 2 |
Citizen science for predicting spatio-temporal patterns in seabird abundance during migration PONE-D-19-29580R2 Dear Dr. Martin, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Vitor Hugo Rodrigues Paiva Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #2: All comments have been addressed ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #2: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #2: Yes ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #2: Yes ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #2: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #2: I think authors addressed all my comments and elucidated some of my concerns on the methods, particularly, the use of Lat and Long in the models (which was my main concern) and further description of the techniques applied. ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #2: No |
Formally Accepted |
PONE-D-19-29580R2 Citizen science for predicting spatio-temporal patterns in seabird abundance during migration Dear Dr. Martin: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Vitor Hugo Rodrigues Paiva Academic Editor PLOS ONE |
Open letter on the publication of peer review reports
PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.
We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.
Learn more at ASAPbio .