Human visual grouping based on within- and cross-area temporal correlations

Yen-Ju Chen; Zitang Sun; Shin’ya Nishida

doi:10.1371/journal.pcbi.1013001

Peer Review History

Original SubmissionMarch 25, 2025
6 Jul 2025 Decision Letter - Yuanning Li, Editor PCOMPBIOL-D-25-00570 Human visual grouping based on within- and cross-area temporal correlations PLOS Computational Biology Dear Dr. Chen, Thank you for submitting your manuscript to PLOS Computational Biology. After careful consideration, we feel that it has merit but does not fully meet PLOS Computational Biology's publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript within 60 days Sep 05 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at ploscompbiol@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pcompbiol/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: * A rebuttal letter that responds to each point raised by the editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. This file does not need to include responses to formatting updates and technical items listed in the 'Journal Requirements' section below. * A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. * An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, competing interests statement, or data availability statement, please make these updates within the submission form at the time of resubmission. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter We look forward to receiving your revised manuscript. Kind regards, Yuanning Li Academic Editor PLOS Computational Biology Daniele Marinazzo Section Editor PLOS Computational Biology Journal Requirements: 1) We ask that a manuscript source file is provided at Revision. Please upload your manuscript file as a .doc, .docx, .rtf or .tex. If you are providing a .tex file, please upload it under the item type u2018LaTeX Source Fileu2019 and leave your .pdf version as the item type u2018Manuscriptu2019. 2) Please upload all main figures as separate Figure files in .tif or .eps format. For more information about how to convert and format your figure files please see our guidelines: https://journals.plos.org/ploscompbiol/s/figures 3) We have noticed that you have uploaded Supporting Information files, but you have not included a list of legends. Please add a full list of legends for your Supporting Information files after the references list. 4) Please amend your detailed Financial Disclosure statement. This is published with the article. It must therefore be completed in full sentences and contain the exact wording you wish to be published. 1) State what role the funders took in the study. If the funders had no role in your study, please state: "The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript." 2) If any authors received a salary from any of your funders, please state which authors and which funders.. If you did not receive any funding for this study, please simply state: u201cThe authors received no specific funding for this work.u201d 5) The file inventory includes files for Figures 5a and 5b. We would recommend either combining these into a single Figure 1.tiff file with separate internal panels, or renumbering them as individual figures, as we are not able to publish multiple components of a single figure as separate files. Reviewers' comments: Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: This study leverages vision transformer models to create stimuli where the temporal correlation within and between regions of a stimulus can be robustly manipulated. They first explain the details of this model’s implementation and then test humans’ ability to distinguish between images with two distinct subregions and images without distinct subregions. They find that participants are sensitive to global similarity relations and do not merely depend on pairwise similarity comparisons between two elements. Finally, they develop a computational model that, when perceptual noise is included as a parameter and fit to human performance, predicts human performance on the psychophysical task quite accurately. I liked this paper and think that it makes good contributions to the study of midlevel vision. I have a few suggestions that I would like to see addressed before I recommend it for publication. First, and most importantly, I think the paper would be improved by making the questions addressed in Studies 2 and 3 of the study earlier and more clearly. In particular, after Study 1 (p. 15), the authors go very quickly into experimental parameters for Study 2. This reader would have been helped by a more thorough explanation of what was being tested and what the relevant hypotheses were going into Study 2. For Study 3, I was a little unclear about whether the naïve model was the best foil to compare with the parameterized model. It seems predictable that a model that directly computes cosine similarity will perform near-optimally and make few humanlike errors. Under what hypothesis would human performance match the naïve model better or equally well to the parameterized model? Is there another model, potentially a non-graph cut model whose differences in human predictability would be more illuminating? Finally, I think the authors could make clearer what was learned about human perception in the General Discussion. The GD does have some of this, and I like what is written, but given the breadth of readership at PLOS CB, I think many readers would benefit from a little additional explanation of the key takeaways. Minor: On page 6 (first bullet point), the authors say that W is short for “weight”. Corrrect me if I’m wrong, but should this be “width”? Reviewer #2: In this manuscript, the authors tested an interesting research question: how does the human visual system segment image areas using spatial and temporal similarities. To better answer this question, the authors introduces a three-stage programme aimed at understanding how global temporal-similarity structures guide visual segmentation. (1) A Vision-Transformer-based generator (Exp 1) creates dynamic 8 × 16 noise images whose only diagnostic cue is a user-specified matrix of within-area and cross-area cosine correlations. (2) In a 2IFC psychophysical task (Exp 2) observers decide which of two 1s video clips contains a segmentable texture (created in Exp1); performance grows monotonically with the difference between the 'within group similarity' and 'between group similarity'. (3) A computer vision model (Exp 3) that capture the human performance and illustrate the detialled computational mechanisms of human visual texture segmentation. The authors conclude that the human visual system performs a near-global, similarity-weighted computation that can be approximated by a noisy spectral-clustering algorithm. The work is timely, technically ambitious and, in principle, sound. The stimulus generator is elegant (but some justification is needed), the behavioural gradients are clear (although there are some limitaitons), and the modelling framework offers a quantitative bridge between vision science and computer vision methods. However, several inconsistencies in the methods section, together with unanswered questions about statistical treatment and parameter sensitivity, currently limit interpretability. Addressing the issues below will strengthen both the empirical foundation and the theoretical claims. Major Concerns: 1. The stimuli generation methods (Exp1). I found a lot of details are missing. First, I think I understand that each frame is in-distinguishable between areas, the temporal information allows such distinguish, am I right? However, I am a bit worried by the consistant and clear cut off between the two areas (easy to calculate). For instance, a recent paper (Vacher, J., Launay, C., Mamassian, P., & Coen-Cagli, R. (2023). Measuring uncertainty in human visual segmentation. PLoS Computational Biology, 19(9), e1011483.) tested a similar research question, but do not have such a rigorous and pre-defined boundary (or box). 2. The human data collection needs more details. (1) How many subjects? The authors said there are 4 (2 authors + 2 naive subjects), however, there are 5 participants' data in Figure 5 (excluding the 'general' one). I think this is very concerning. (2) Details of the procedure. This Expiement 2 is very long (7.5 hrs), pls specify the duration of each block and the gap/rest/interval between for future replication. (3) The major concern lies in here: if half of the subjects are the authors, even if they cant figure out the 'correct answer' simply from images (nice image generation from Exp 1), they are still aware the stimuli were seperated at the centre. SO it is very possible for them (which consistute half of the data!) to simply rely on the centre part of the videos/stimuli (not necessarily the whole image) to make the judgement. Yes, the stimuli are so 'uniform' in this away. Moreover, the design of Exp 2 is a bit off, why 2IFC? Is it possible that both videos are very much seperable but they have to chose one, or vice versa? Some additional details are necessary. 3. The Exp3's data suggests the SNR is an important factor, see minor concern for more details. I am a bit worried (as the authors stated in discussion), even this model could not read the 'temporal' aspect and so the stimuli are flatted from 3D to 2D. Does this affect the interpretation of the data? Minor concerns: 1. Behavioural data suggests the seperation ratios are “Luminance > Colour > Phase”. The difference is compatible with known channel SNR differences (as mentioned in Exp3), yet the manuscript currently leaves the reader wondering if uncontrolled cues leaked through stimulus generation. A discussion or justification is necesary. 2. Does one need the ViT to create the stimuli? Is it possible we can create these stimuli with a rigorous pre-defined algorithm? Does the ViT neccesary here? Reviewer #3: 1. Originality Up to my knowledge no previous works have been focusing on temporal cues of spatial segmentation 2. Innovation The work propose a method to generate well controlled stimuli to conduct a precise study about visual segmentation I've spent some time trying to generate these stimuli without the proposed method to convince myself that the method is necessary to generate appropriate stimuli. 3. High importance to researchers in the field The results are quite remarkable and relevant for the studies around perceptual organization 4. Significant biological and/or methodological insight Standard psychometric methods 5. Rigorous methodology Yes 6. Substantial evidence for its conclusions Yes What are the main claims of the paper and how significant are they for the discipline? The mains claims are : - Humans are able to successfully segment images based on the spatial integration of temporal correlation cues. - The data can be explained by a graph-based model that integrate pairwise correlations across the image Are these claims novel? If not, which published articles weaken the claims of originality of this one? I am not aware of other sharp results like this regarding visual segmentation. Are the claims properly placed in the context of the previous literature? Have the authors treated the literature fairly? Yes Do the data and analyses fully support the claims? If not, what other evidence is required? Yes Would additional work improve the paper? How much better would the paper be if this work were performed and how difficult would it be to do this work? In many places the authors talk about similarity or correlation, they must precise that it is temporal (or not) ! (eg l461) Section Design and procedure (l383) : the writing must be improved/simplified. I had the impression to read the same info twice written in slightly different manner. The section contain redoundancies : - "introduce three different attributes" / "we examine three attributes" - "viewed two consecutive 1-second video clips" / "view 1.000 ms video clips" The number of repetitions for each of the 45 cor pairs is not indicated. Section Model architecture (l487) : Isn't it F'_sig that is used to compute S_sig ?? Typos : l51 : computational model l307 : missing ref for figure l374 : it seems you have 5 participants not 4 l513 : it's unclear what is the normalized Laplacian matrix l527 : \bar T is the target not T(h,w) Questions : - Eq 9 why absolute values instead of squared ? Eq 11 why norm-1 is used ? Isn't it a regular euclidean norm ? (in fact he question hold for all norms !) - Figure 5 : why the checkerboards are not straight ? It is unclear what these weird shapes are representing. Complementary references for discussion - Vacher, J., Launay, C., Mamassian, P., & Coen-Cagli, R. (2023). Measuring uncertainty in human visual segmentation. PLoS Computational Biology, 19(9), e1011483 - Wallis, Thomas SA, et al. "Image content is more important than Bouma’s Law for scene metamers." ELife 8 (2019): e42512. Has the author-generated code that underpins the findings been made publicly available? Yes Are details of the methodology sufficient to allow the experiments to be reproduced? With the minor corrections yes Is the manuscript well organized and written clearly enough to be accessible to non-specialists? Overall yes Does the paper use standardized scientific nomenclature and abbreviations? If not, are these explained at the first usage? Yes ******** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes Figure resubmission: While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. If there are other versions of figure files still present in your submission file inventory at resubmission, please replace them with the PACE-processed versions. Reproducibility:** To enhance the reproducibility of your results, we recommend that authors of applicable studies deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols https://doi.org/10.1371/journal.pcbi.1013001.r001
Revision 1
16 Jul 2025 Author Response Attachments Attachment Submitted filename: Point to Point Response_ver3-1.docx https://doi.org/10.1371/journal.pcbi.1013001.r002
22 Aug 2025 Decision Letter - Yuanning Li, Editor PCOMPBIOL-D-25-00570R1 Human visual grouping based on within- and cross-area temporal correlations PLOS Computational Biology Dear Dr. Chen, Thank you for submitting your manuscript to PLOS Computational Biology. After careful consideration, we feel that it has merit but does not fully meet PLOS Computational Biology's publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript within 30 days Oct 22 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at ploscompbiol@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pcompbiol/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: * A rebuttal letter that responds to each point raised by the editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. This file does not need to include responses to formatting updates and technical items listed in the 'Journal Requirements' section below. * A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. * An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, competing interests statement, or data availability statement, please make these updates within the submission form at the time of resubmission. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. We look forward to receiving your revised manuscript. Kind regards, Yuanning Li Academic Editor PLOS Computational Biology Daniele Marinazzo Section Editor PLOS Computational Biology Additional Editor Comments: The reviewers are in general happy with the revisions, with only a few minor points that need to be addressed before final acceptance. Please consider these comments in a final round of revision. Reviewers' comments: Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: I thank the reviewers for addressing my concerns. I think the paper is suitable for publication apart from a few typos. p. 7: "In our implementation, we used several 188 four stacked ResidualBlock2D modules, each consisting of convolution, normalization 189 (InstanceNorm2D), and GELU activation, with the residual connection between two layers." Should be corrected p. 17: As reviewer 2 pointed out, the demographic information is not complete. There were 5 participants, but only four participants' demographic information was reported. Reviewer #2: In this revision, the authors followed the comments and made the manuscript into a better shape. They updated some very important missing information in the manuscript. The updated neurophysiology part is good. Also, the updated discussion section strongly extends the readability and clarity of the manuscript. I found the updated general discussion and the method section very good. I think most of my concerns are solved. But I still want to point out that in the current human visual cognition/visual perception studies, we tend to have a larger sample size. While I believe the authors carried over some tendency from animal (primates) studies (like testing a lot of trials on two monkeys), I still suggest that they carefully note this limitation in the discussion. Reviewer #3: I thank the authors for their responses and I have no remaining remarks. I think the paper is ready for publication. ******** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: None Reviewer #3: Yes ****** PLOS authors have the option to publish the peer review history of their article (what does this mean? ). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy . Reviewer #1: No Reviewer #2: No Reviewer #3: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] Figure resubmission: While revising your submission, we strongly recommend that you use PLOS’s NAAS tool (https://ngplosjournals.pagemajik.ai/artanalysis) to test your figure files. NAAS can convert your figure files to the TIFF file type and meet basic requirements (such as print size, resolution), or provide you with a report on issues that do not meet our requirements and that NAAS cannot fix. After uploading your figures to PLOS’s NAAS tool - https://ngplosjournals.pagemajik.ai/artanalysis, NAAS will process the files provided and display the results in the "Uploaded Files" section of the page as the processing is complete. If the uploaded figures meet our requirements (or NAAS is able to fix the files to meet our requirements), the figure will be marked as "fixed" above. If NAAS is unable to fix the files, a red "failed" label will appear above. When NAAS has confirmed that the figure files meet our requirements, please download the file via the download option, and include these NAAS processed figure files when submitting your revised manuscript. Reproducibility:** To enhance the reproducibility of your results, we recommend that authors of applicable studies deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols https://doi.org/10.1371/journal.pcbi.1013001.r003
Revision 2
26 Aug 2025 Author Response Attachments Attachment Submitted filename: Point to Point Response_Rev2_Ver1.docx https://doi.org/10.1371/journal.pcbi.1013001.r004
30 Aug 2025 Decision Letter - Yuanning Li, Editor Dear Dr. Chen, We are pleased to inform you that your manuscript 'Human visual grouping based on within- and cross-area temporal correlations' has been provisionally accepted for publication in PLOS Computational Biology. Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests. Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated. IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript. Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS. Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. Best regards, Yuanning Li Academic Editor PLOS Computational Biology Daniele Marinazzo Section Editor PLOS Computational Biology *********************************************************** https://doi.org/10.1371/journal.pcbi.1013001.r005
Formally Accepted
Acceptance Letter - Yuanning Li, Editor PCOMPBIOL-D-25-00570R2 Human visual grouping based on within- and cross-area temporal correlations Dear Dr Chen, I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course. The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers. You will receive an invoice from PLOS for your publication fee after your manuscript has reached the completed accept phase. If you receive an email requesting payment before acceptance or for any other service, this may be a phishing scheme. Learn how to identify phishing emails and protect your accounts at https://explore.plos.org/phishing. Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work! With kind regards, Zsofia Freund PLOS Computational Biology \| Carlyle House, Carlyle Road, Cambridge CB4 3DN \| United Kingdom ploscompbiol@plos.org \| Phone +44 (0) 1223-442824 \| ploscompbiol.org \| @PLOSCompBiol https://doi.org/10.1371/journal.pcbi.1013001.r006

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .