Peer Review History
Original SubmissionNovember 6, 2023 |
---|
PONE-D-23-36575Comparing biased random walks in graph embedding and link predictionPLOS ONE Dear Dr. Silva, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. You will see that both reviews recommend that you revise your manuscript, so please consider making the suggested changes. After my own reading of the manuscript, I agree with them in that the paper is interesting, well-written in general and has potential for publication after a major revision. Please consider making the suggested changes to better highlight the contribution of your work. If you feel you can comprehensively address the reviewer's concerns, please provide a point-by-point response to these comments along with your revision. Please show all changes in the manuscript text file with track changes or color highlighting. If you are unable to address specific reviewer requests or find any points invalid, please explain why in the point-by-point response. Please submit your revised manuscript by Jun 08 2024 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript:
If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Pablo Martin Rodriguez Academic Editor PLOS ONE Journal Requirements: Additional Editor Comments (if provided): [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Yes ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: No ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: The authors compared how different types of random walks impact link prediction tasks. The paper is well-organized and well-written. However, I still have a few major concerns that I listed below: 1) Throughout the paper, the authors mention that they are comparing biased random walks. However, the traditional random walk is not biased; in some parts of the text, it is referred to as a biased random walk. Note also that P(v,u) does not depend on v, but only on u. For instance, see the title and the first paragraph of the Methodology section, "We leveraged nine biased random walk algorithms ..." 2) Although enough for the paper's purposes, The definition in the Section "Random walk" is not precise. For instance, the random walk is defined as a "sequence of nodes traversed during a walk through a network." However, this is the definition of a finite walk. To be a random walk, you have to ensure that the sequence is random. Also, the random walk can be infinite. Being finite is not a necessary condition. 3) In the true self-avoiding random walk on page 8, the vector f_x is defined as the "frequency of visits to x." How do you define the frequency? Would it be the number of times the random walker passes through node x divided by the total number of steps? After equation (5), it seems to be the number of visits. If I understood correctly, both definitions should lead to the same transition probability, but it would be better to clarify this point to make the paper more consistent and easier to read. 4) The Node2Vec description is not clear. Perhaps my questions are due to the fact that I am not so familiar with this random walk, but it would be better to clarify this to make the paper more accessible to a wider audience. My main question here is how to write this random walk transition probability in the form of Eq. (1). Since you have two parameters (p and q), it is not clear how the transition probabilities sum up to one, i.e., \\sum_v P(u,v) = 1 for a non-lazy random walk. 5) My main concern about the paper is the importance and impact of the paper's results. In the last paragraph of the "Quality Comparison" section, the authors mention: "Our finding suggests that the outcomes of link prediction are more dependent on the intrinsic characteristics and properties of the network itself, rather than the specific choice of the random walk applied." However, this seems to be a trivial result. Moreover, this makes us ask more questions, for example, what are the features that make it easier/harder to improve the results? One possibility would be to test for the impact of correlations, which could be tested by doing a similar experiment using the configuration model as a null model, generating an uncorrelated network, and comparing the quality of the original and the uncorrelated version (maybe for a few networks). 6) The database. Most of the analyzed networks are relatively small. So, what is the impact of the network sizes on the quality of the predictions? Could it be that the results change for larger networks? The authors should clarify that or provide evidence that their results should hold for larger networks. 7) The walk similarities correlations should be better described. I could not understand the analysis very well. For example, in Fig. 5b, at the bottom, there is "Pearson 1.0," however, this coefficient does not refer to how the points in the plot are correlated. Note that a Pearson = 1.0 should be a perfect straight line. Minor details: a) On page 7, when defining the maximum walk length, the authors mention, "in this project." Perhaps substituting the word project for paper would be slightly better. b) There is a typo in Eq. (6), which is defined using the Greek letter rho, while the letter p is used in the following paragraphs. c) After Eq. (7), there is a paragraph space that should be removed. d) Fig. 4 is a bit confusing. The plot is continuous, but the x-axis is discrete. I understand that this is done for visualization purposes, but we do not know which network is represented by each color, and it is still difficult to follow the lines. Perhaps just the points would be enough. In general, the authors should strengthen their results by emphasizing the importance of their findings and how they can be helpful for future research. Points 5 - 7 are my major concerns, while points 1-4 are important definitions that can be easily fixed. Thus, I would suggest a major revision. Reviewer #2: The paper addresses the effect of the choice of the random walk for graph embedding. Two main experiments are performed: 1) the ability of the embedding to predict the missing edges in the graph; and 2) the similarity of the embeddings (in terms of the correlation of cosine similarities of the endpoints of the edges). The subject is interesting and the paper is well written. The experiments are very objective and the results are clear. Paper organization is appropriate. I agree with most of the conclusions of the paper. However, authors state that their results "suggest that we can potentially recover the underlying network structures from such data, regardless of the nature of the walks performed, adding to the versatility of network science in analyzing complex systems." I especially disagree with this statement. Although different random walks lead to similar performance in link prediction and the embeddings share some similarities, the leap to the conclusion that the underlying network structure can be recovered from the embeddings independently of the random walk seems too strong. I suggest to tone down this statement; maybe stating as a hypothesis for future works. For me a major issue is the lack of information about model training setup. Authors state "hyperparameters of the embedding model were optimized to ensure optimal and consistent performance." What does that even mean? How can you guarantee optimality? What is consistency here? Please provide more details about the training setup. There are also minor issues that should be addressed: - The way authors talk about the random walk mechanism in the node2vec algorithm is sometimes confusing: "different configurations of node2vec." Actually, they parametrize the underlying random walk in node2vec with five different settings. I suggest to clarify this. - node2vec is sometimes written as "Node2Vec" and sometimes as "node2vec". Please standardize the notation. - Authors use notation Log, log, and ln. Please standardize and clarify the notation. - p. 2, "Our dataset revealed" -> "Our experiments revealed" - Sentence "This becomes particularly insightful for datasets such as textual..." in the Introduction is not clear. - When explaining second-order proximity (about LINE), what does "similarity between p_u and p_v" mean mathematically? - Authors state that ID "outperforms other degree-biased walks in terms of time efficiency." What does this mean? In a similar vein, what does "TSAW achiving the best learning curve" mean? - For me N2V(1, 1) *is* RW. Why test both? Is it because the different embedding strategies? - What is the "algorithm equipped with a built-in function"? - What is \\rho in eq. (6)? - In the sentence "we applied normalization procedure to scale the walk similarities between 0 and 1", what is walk similarity? Some minor comments: - For me, 25% of missing edges is a lot. Is there any reason for this choice? - Just a matter of taste, but I think eq. (2), (3), (4) and (5) could be written for \\tau_uv instead of P(v|u). This way you make use of eq. (1) and the verbose denominators are avoided. - Is the choice of \\alpha = 200 appropriate for small networks (n < 200)? I am not sure about this. ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No ********** [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. |
Revision 1 |
Comparing random walks in graph embedding and link prediction PONE-D-23-36575R1 Dear Dr. Silva, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager® and clicking the ‘Update My Information' link at the top of the page. If you have any questions relating to publication charges, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Pablo Martin Rodriguez Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: |
Formally Accepted |
PONE-D-23-36575R1 PLOS ONE Dear Dr. Silva, I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team. At this stage, our production department will prepare your paper for publication. This includes ensuring the following: * All references, tables, and figures are properly cited * All relevant supporting information is included in the manuscript submission, * There are no issues that prevent the paper from being properly typeset If revisions are needed, the production department will contact you directly to resolve them. If no revisions are needed, you will receive an email when the publication date has been set. At this time, we do not offer pre-publication proofs to authors during production of the accepted work. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few weeks to review your paper and let you know the next and final steps. Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. If we can help with anything else, please email us at customercare@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Professor Pablo Martin Rodriguez Academic Editor PLOS ONE |
Open letter on the publication of peer review reports
PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.
We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.
Learn more at ASAPbio .