Aerial small target detection algorithm based on cross-scale separated attention

Ju Liang; Fan Wang; Jia Chen; Hai-Yan Huang; Zu-Fan Dou

doi:10.1371/journal.pone.0337318

Peer Review History

Original SubmissionAugust 20, 2025
17 Sep 2025 Decision Letter - Yaseen Ahmed Al-Mulla, Editor Dear Dr. Wang, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Nov 01 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org . When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols . Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols . We look forward to receiving your revised manuscript. Kind regards, Yaseen Al-Mulla Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. PLOS requires an ORCID iD for the corresponding author in Editorial Manager on papers submitted after December 6th, 2016. Please ensure that you have an ORCID iD and that it is validated in Editorial Manager. To do this, go to ‘Update my Information’ (in the upper left-hand corner of the main menu), and click on the Fetch/Validate link next to the ORCID field. This will take you to the ORCID site and allow you to create a new iD or authenticate a pre-existing iD in Editorial Manager. 3. We note that Figure(s) 2, 11 and 12, in your submission contain copyrighted images. All PLOS content is published under the Creative Commons Attribution License (CC BY 4.0), which means that the manuscript, images, and Supporting Information files will be freely available online, and any third party is permitted to access, download, copy, distribute, and use these materials in any way, even commercially, with proper attribution. For more information, see our copyright guidelines: http://journals.plos.org/plosone/s/licenses-and-copyright. We require you to either (1) present written permission from the copyright holder to publish these figures specifically under the CC BY 4.0 license, or (2) remove the figures from your submission: a. You may seek permission from the original copyright holder of Figure(s) 2, 11 and 12 to publish the content specifically under the CC BY 4.0 license. We recommend that you contact the original copyright holder with the Content Permission Form (http://journals.plos.org/plosone/s/file?id=7c09/content-permission-form.pdf) and the following text: “I request permission for the open-access journal PLOS ONE to publish XXX under the Creative Commons Attribution License (CCAL) CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). Please be aware that this license allows unrestricted use and distribution, even commercially, by third parties. Please reply and provide explicit written permission to publish XXX under a CC BY license and complete the attached form.” Please upload the completed Content Permission Form or other proof of granted permissions as an "Other" file with your submission. In the figure caption of the copyrighted figure, please include the following text: “Reprinted from [ref] under a CC BY license, with permission from [name of publisher], original copyright [original copyright year].” b. If you are unable to obtain permission from the original copyright holder to publish these figures under the CC BY 4.0 license or if the copyright holder’s requirements are incompatible with the CC BY 4.0 license, please either i) remove the figure or ii) supply a replacement figure that complies with the CC BY 4.0 license. Please check copyright information on all replacement figures and update the figure caption with source information. If applicable, please specify in the figure caption text when a figure is similar but not identical to the original image and is therefore for illustrative purposes only. 4. If the reviewer comments include a recommendation to cite specific previously published works, please review and evaluate these publications to determine whether they are relevant and should be cited. There is no requirement to cite these works unless the editor has indicated otherwise. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? Reviewer #1: Yes Reviewer #2: Partly ******** 2. Has the statistical analysis been performed appropriately and rigorously? -->?> Reviewer #1: Yes Reviewer #2: Yes ****** 3. Have the authors made all data underlying the findings in their manuscript fully available??> The PLOS Data policy Reviewer #1: Yes Reviewer #2: No ****** 4. Is the manuscript presented in an intelligible fashion and written in standard English??> Reviewer #1: Yes Reviewer #2: Yes ****** Reviewer #1: 1. Can you elaborate on the specific challenges posed by multi-scale distribution and complex occlusion scenarios in UAV aerial photography? 2. How does the UAS-YOLO algorithm address the limitations of the YOLOv11s model, particularly in terms of feature representation and cross-level fusion? 3. What are the key advantages of the Adaptive Bidirectional Feature Pyramid Network (ABiFPN) in integrating multi-scale features? 4. Can you provide more details on the Separated and Enhancement Attention Module (SEAM) and its role in improving detection precision for occluded small targets? 5. How does the Universal Inverted Bottleneck (UIB) module contribute to suppressing background noise and focusing on target-related features? 6. Can you discuss the significance of the VisDrone2019 and TinyPerson datasets in evaluating the performance of the UAS-YOLO algorithm? 7. What are the implications of the improved mean Average Precision (mAP) results for the UAS-YOLO algorithm on these datasets? 8. How does the UAS-YOLO algorithm compare to other state-of-the-art object detection algorithms in terms of performance and applicability? 9. Can you elaborate on the dynamic channel attention mechanism and spatial feature recalibration in the UIB module? 10. What are the potential applications of the UAS-YOLO algorithm beyond UAV aerial photography, such as in other computer vision tasks? 11. How does the UAS-YOLO algorithm handle varying levels of occlusion and background interference in different scenarios? 12. Can you discuss the computational complexity and efficiency of the UAS-YOLO algorithm, particularly in real-time applications? 13. What are the potential limitations or challenges of implementing the UAS-YOLO algorithm in real-world UAV systems? 14. Highlighted article might be considered for related work section. (https://doi.org/10.1016/j.compeleceng.2022.108405) 15. Can you provide more insights into the cross-scale separated attention mechanism and its role in improving feature representation? 16. What are the future research directions for improving the UAS-YDOLO algorithm and its applications in UAV-based object detection? Reviewer #2: In the paper “Aerial small target detection algorithm based on cross-scale separated attention” the authors propose an improvement of YOLOv11 model for small object detection. The considered problem is highly relevant and interesting. The paper is good structured and written. However, I would like to highlight the following drawbacks: 1. The main problem of the paper is that authors do not consider existing YOLO modifications for small object detection, comparing only with the base models. 2. The ablation study is insufficient. The impact of the proposed modules is shown in a very general way, while it is necessary to demonstrate the improvement on specific samples that were the target of the improvement. This will be more convincing, and also will give grounds to believe that the result is not overfitted to a particular dataset. Overall, the paper cannot be accepted due to insufficient comparison with existing YOLO modifications and the weak justification of the results. ****** what does this mean? ). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy Reviewer #1: No Reviewer #2: No ******** [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/ . PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org . Please note that Supporting Information files do not need this step. https://doi.org/10.1371/journal.pone.0337318.r001
Revision 1
17 Oct 2025 Author Response Dear Editors and Reviewers, We are writing to resubmit our revised manuscript entitled “Aerial small target detection algorithm based on cross-scale separated attention” (Manuscript ID: PONE-D-25-44314) for further consideration of publication in “PLOS ONE”. We sincerely appreciate the time and effort you and the reviewers have dedicated to this work—their insightful and constructive feedback has significantly enhanced the scientific rigor, clarity, and completeness of our manuscript. We have systematically addressed all comments raised by the reviewers, and a detailed point-by-point response is provided in the attached document “Response to Reviewers”. 1.Copyright Compliance (Response to Editor's Concern) We have fully complied with PLOS ONE's copyright guidelines:Obtained written permission from the copyright holders of the VisDrone2019 and TinyPerson datasets to publish Figures 2,10,11,12,14,and15,under the Creative Commons Attribution License (CC BY 4.0). The completed Content Permission Form is uploaded as an “Other” file in the submission system. 2. Comprehensive Responses to Reviewer #1's 16 Suggestions All 16 comments from Reviewer #1—covering methodology details, experimental validation, dataset significance, and application prospects—have been addressed with targeted revisions in the manuscript, as detailed in “Response to Reviewers”. 3. Enhanced Experimental Rigor(In response to Reviewer #2's comment that“ The ablation study is insufficient. The impact of the proposed modules is shown in a very general way, while it is necessary to demonstrate the improvement on specific samples that were the target of the improvement. This will be more convincing, and also will give grounds to believe that the result is not overfitted to a particular dataset.”) In response to concerns about insufficient ablation studies and overfitting risks, we added 3 sets of scenario-specific validation experiments (Figs. 10–12): (1)Small distant targets: Verified ABiFPN's multi-scale fusion with 20–30 pixel vehicles, achieving a 2.1% mAP50 improvement (Table 4). (2)Occluded objects: Demonstrated SEAM's feature compensation on pedestrians with 30–50% occlusion, leading to a 1.2% mAP50 gain (Table 4). (3)Complex background interference: Validated C3K2_UIB's noise suppression in dense urban scenes, resulting in a 3.6% mAP50 increase (Table 4). To rule out overfitting, we expanded validation to the TinyPerson dataset (focused on seaside ultra-small targets), where UAS-YOLO achieved a 2.1% mAP50 improvement, verifying cross-dataset generalization 4.In response to Reviewer #2's comment that “the paper only compares with base models and lacks existing YOLO modifications for small object detection,” we have added three state-of-the-art (SOTA) models tailored for UAV small target detection in the “Experiments” section (Lines 389–396) of the revised manuscript. These models cover both YOLO variants and DETR-based architectures, ensuring a comprehensive comparison with scenario-specific advanced methods: [25] Zhao Y, Lv W, Xu S, et al. DETRs Beat YOLOs on Real-time Object Detection. 2024. https://doi.org/10.48550/arXiv.2304.08069 [26] Xing X, Luo F, Wan L, et al. LMAD-YOLO: A vehicle image detection algorithm for drone aerial photography based on multi-scale feature fusion. PLoS One. 2025;20(7):e0328248. https://doi.org/10.1371/journal.pone.0328248 [27] Zheng Z, Zhao J, Fan J. YOLO-GML: An object edge enhancement detection model for UAV aerial images in complex environments. PLoS One. 2025;20(7):e0328070. https://doi.org/10.1371/journal.pone.0328070 5.All revisions are clearly marked in the “Revised Manuscript with Track Changes” (new content is highlighted in red for easy identification). Our work aligns closely with PLOS ONE's focus on “innovative methods for real-world applied science”—UAS-YOLO provides a lightweight, high-precision solution for UAV remote sensing, addressing core challenges of multi-scale targets, occlusion, and background interference. We believe the revised manuscript now meets the journal's academic rigor and publication standards. Thank you again for your time, patience, and guidance throughout the revision process. We earnestly request your consideration of the revised manuscript and look forward to advancing this work further. Attachments Attachment Submitted filename: Response-to-Reviewers.docx https://doi.org/10.1371/journal.pone.0337318.r002
26 Oct 2025 Decision Letter - Yaseen Ahmed Al-Mulla, Editor Dear Dr. Wang, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Dec 10 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org . When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols . Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols . We look forward to receiving your revised manuscript. Kind regards, Yaseen Al-Mulla Academic Editor PLOS ONE Journal Requirements: If the reviewer comments include a recommendation to cite specific previously published works, please review and evaluate these publications to determine whether they are relevant and should be cited. There is no requirement to cite these works unless the editor has indicated otherwise. Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author Reviewer #1: All comments have been addressed Reviewer #3: (No Response) ******** 2. Is the manuscript technically sound, and do the data support the conclusions??> Reviewer #1: Yes Reviewer #3: Yes ****** 3. Has the statistical analysis been performed appropriately and rigorously? -->?> Reviewer #1: Yes Reviewer #3: Yes ****** 4. Have the authors made all data underlying the findings in their manuscript fully available??> The PLOS Data policy Reviewer #1: Yes Reviewer #3: Yes ****** 5. Is the manuscript presented in an intelligible fashion and written in standard English??> Reviewer #1: Yes Reviewer #3: Yes ****** Reviewer #1: All necessary review comments are addressed in the revised article, and it is organised appropriately. Reviewer #3: The paper presents UAS-YOLO, an improved object detection model based on YOLOv11s, tailored for detecting small objects in UAV aerial imagery. The core contributions are the integration of three modified components: an Adaptive BiFPN (ABiFPN) for feature fusion, a Separated and Enhancement Attention Module (SEAM) for handling occlusion, and a C3K2_UIB module for feature refinement. It does not propose a fundamentally new architecture or a novel, standalone algorithm. Instead, it follows a common and practical research pattern in applied computer science: selecting a strong, modern baseline (YOLOv11s) and enhancing it by plugging in or adapting existing architectural components from other literature (e.g., ideas from BiFPN, MobileNetV4, and attention mechanisms). The adaptation and combination for a specific domain (UAV small targets) constitute the contribution, not the invention of the core components themselves. It is highly probable that an LLM (like GPT) was used in the writing process, likely for polishing, expanding, or restructuring text drafted by the authors. The indicators are: The paper swings between very formal, stilted phrasing and more natural, direct sentences. For example, phrases like "This study proposes," "Specifically, first," and "Its core value lies in..." are common LLM hallmarks for structuring text. Key ideas and the names of the modules (ABiFPN, SEAM, C3K2_UIB) are repeated in an almost identical manner multiple times throughout the paper, especially in the Abstract, Introduction, and Conclusion. This is a classic trait of LLM-generated text to meet length or coherence requirements. Some passages use many words to convey a simple idea. For instance, the description of the C3K2_UIB's benefits is rephrased several times with minimal new information. Sentences like "This dual-branch design enhances feature disentanglement for robust representation" sound insightful but are somewhat vague and are not backed by deeper theoretical analysis or novel architectural proof. Text obtained from LLM/GPT must be revised. Where exactly is SEAM focusing? Showing heatmaps for occluded vs. unoccluded regions would provide compelling evidence for its claimed mechanism. What do the adaptive weights in ABiFPN actually learn? Do they consistently prioritize certain scales for small objects? This analysis is missing. While mentioned, the significant parameter increase (34%) from C3K2_UIB warrants a more critical discussion. Is this the most efficient way to achieve the performance gain? A comparison against other, simpler feature enhancement modules would strengthen the claim. Including models like Faster R-CNN, SSD, and RetinaNet on a modern small-target UAV dataset is almost a strawman argument. These models are known to perform poorly on this task. Their inclusion pads the comparison table but adds little scientific value. The paper should be compared against the most recent and best-performing models specifically designed for UAV small object detection from the last 1-2 years. The selection of baselines feels curated to make UAS-YOLO look good. The decision not to use pre-trained weights, while intended to ensure fairness, is unrealistic and puts all models at a disadvantage. Modern research, especially incremental work on architectures, almost universally leverages pre-training. The Introduction, Abstract, and Conclusion are highly repetitive, stating the same problem and solution in nearly identical terms. This is a sign of insufficient editing. It can be accepted after the above minor revisions are taken into consideration. ****** what does this mean? ). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy Reviewer #1: No Reviewer #3: No ******** [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] To ensure your figures meet our technical requirements, please review our figure guidelines: https://journals.plos.org/plosone/s/figures You may also use PLOS’s free figure tool, NAAS, to help you prepare publication quality figures: https://journals.plos.org/plosone/s/figures#loc-tools-for-figure-preparation. NAAS will assess whether your figures meet our technical requirements by comparing each figure against our figure specifications. https://doi.org/10.1371/journal.pone.0337318.r003
Revision 2
4 Nov 2025 Author Response Subject: Response to Reviewers for Manuscript ID PONE-D-25-44314 Dear Editors and Reviewers: We would like to extend our sincere gratitude to you and the respected reviewers for investing precious time and effort into the thorough review of our manuscript. The constructive feedback and insightful suggestions provided have been instrumental in refining the quality of our work, offering valuable guidance for optimizing both the content and presentation of the study. We have carefully examined each comment and revision suggestion from the reviewers, and have made targeted adjustments to address the concerns raised. Below, we present a point-by-point response to Reviewer #3’s comments, with detailed explanations of the corresponding revisions implemented in the manuscript. To facilitate your review, all revised sections (e.g., adjustments to the Abstract, Introduction, Conclusion, and specific line ranges) have been marked in red. Our goal is to ensure the revised manuscript meets the rigorous standards of PLOS ONE and effectively conveys the value of our research. Reviewer #3 1.Comments:The paper presents UAS-YOLO, an improved object detection model based on YOLOv11s, tailored for detecting small objects in UAV aerial imagery. The core contributions are the integration of three modified components: an Adaptive BiFPN (ABiFPN) for feature fusion, a Separated and Enhancement Attention Module (SEAM) for handling occlusion, and a C3K2_UIB module for feature refinement. It does not propose a fundamentally new architecture or a novel, standalone algorithm. Instead, it follows a common and practical research pattern in applied computer science: selecting a strong, modern baseline (YOLOv11s) and enhancing it by plugging in or adapting existing architectural components from other literature (e.g., ideas from BiFPN, MobileNetV4, and attention mechanisms). The adaptation and combination for a specific domain (UAV small targets) constitute the contribution, not the invention of the core components themselves. It is highly probable that an LLM (like GPT) was used in the writing process, likely for polishing, expanding, or restructuring text drafted by the authors. The indicators are: The paper swings between very formal, stilted phrasing and more natural, direct sentences. For example, phrases like "This study proposes," "Specifically, first," and "Its core value lies in..." are common LLM hallmarks for structuring text. Key ideas and the names of the modules (ABiFPN, SEAM, C3K2_UIB) are repeated in an almost identical manner multiple times throughout the paper, especially in the Abstract, Introduction, and Conclusion. This is a classic trait of LLM-generated text to meet length or coherence requirements. Some passages use many words to convey a simple idea. For instance, the description of the C3K2_UIB's benefits is rephrased several times with minimal new information. Sentences like "This dual-branch design enhances feature disentanglement for robust representation" sound insightful but are somewhat vague and are not backed by deeper theoretical analysis or novel architectural proof. Text obtained from LLM/GPT must be revised. The Introduction, Abstract, and Conclusion are highly repetitive, stating the same problem and solution in nearly identical terms. This is a sign of insufficient editing. It can be accepted after the above minor revisions are taken into consideration. 1.Reply:Dear Reviewer, Thank you very much for your valuable comment. First of all, we have revised the paragraphs that you considered to be generated by large language models (e.g., replacing expressions like "Specifically" and "This study proposes" with more appropriate phrasings in line with academic writing conventions). In accordance with your comments, we have made revisions to the Abstract (Lines 19–47), Introduction (Lines 99–121), Conclusion (Lines 678–696), as well as the following sections:Lines 168–169,Lines 188–213, Lines 217–218, Lines 386–387,and Lines 230–264. Additionally, we have removed redundant content, such as the sentence "This dual-branch design enhances feature disentanglement for robust representation." 2. Comments:Where exactly is SEAM focusing? Showing heatmaps for occluded vs. unoccluded regions would provide compelling evidence for its claimed mechanism.What do the adaptive weights in ABiFPN actually learn? Do they consistently prioritize certain scales for small objects? This analysis is missing.While mentioned, the significant parameter increase (34%) from C3K2_UIB warrants a more critical discussion. Is this the most efficient way to achieve the performance gain? A comparison against other, simpler feature enhancement modules would strengthen the claim.The decision not to use pre-trained weights, while intended to ensure fairness, is unrealistic and puts all models at a disadvantage. Modern research, especially incremental work on architectures, almost universally leverages pre-training. 2.Reply:Dear Reviewer, Thank you very much for your valuable comment. Regarding your questions “What do the adaptive weights in ABiFPN actually learn?” and “Do they always prioritize specific scales of small targets?”, we have supplemented relevant analysis in the revised manuscript (Lines 188–213):First, on what the adaptive weights learn: The core adaptive feature weighting module of ABiFPN does not assign weights randomly. Instead, through training, it explicitly learns the matching relationship between features of different scales and target scales. For example, in UAV aerial scenarios, detailed information of small targets (typically 20–60 pixels) is concentrated in low-scale feature layers (e.g., P3 and P4 layers), so the module tilts weights toward these layers to enhance the capture of small-target details. In contrast, the semantic information of large targets mainly relies on high-scale feature layers (e.g., P5 and P6 layers), and weight allocation is adjusted accordingly to highlight the semantic features of large targets.Second, on whether the weights always prioritize small-target scales: The module does not adopt an absolute “fixed priority” strategy. Given the high proportion of small targets in aerial scenarios and their susceptibility to background interference, it dynamically prioritizes assigning weights to low-scale layers with concentrated small targets under general conditions. However, this priority is adaptively adjusted based on the actual target distribution in the input image. For instance, when the image contains a large number of large targets, the weights of high-scale feature layers are increased accordingly to ensure the detection performance of large targets, thereby balancing the detection needs of multi-scale targets. We sincerely appreciate your valuable suggestions! Your points on “clarifying the specific focus of SEAM” and “supplementing evidence to verify the module mechanism” have been highly helpful for improving the logical presentation of our research content. In response to your concerns, we have made targeted revisions to the relevant content (Lines 230–264):Regarding “what SEAM specifically focuses on”, we have clarified in the revision: SEAM primarily focuses on solving the “local feature imbalance” problem of small targets under occlusion in UAV aerial scenarios—i.e., the dual issues of insufficient feature response intensity in unoccluded regions and easy loss of key semantic information in occluded regions. To achieve this goal, the module decouples spatial-channel feature operations via depthwise separable convolution and dynamically learns channel weights with a two-layer fully connected network. On one hand, it directionally enhances the effective feature response of unoccluded regions; on the other hand, it compensates for the missing information in occluded regions, forming a closed-loop optimization of “feature enhancement-information compensation” and ultimately improving the overall feature representation ability of occluded small targets.Regarding your suggestion to “present heatmaps of occluded and unoccluded regions”, we fully recognize the supporting value of heatmaps for verifying the mechanism. However, practical research shows that small targets in aerial images have large scale differences (from pixel-level to tens of pixels) and complex occlusion patterns (partial occlusion/overlapping occlusion/background-interference occlusion). Directly generating heatmaps may lead to insufficient feature visualization accuracy due to target scale fluctuations, making it difficult to accurately reflect the details of SEAM’s feature interaction across multiple scenarios. Therefore, we instead verified the module’s effectiveness through “mechanism decomposition + performance correlation”: in the revised content, we detailed SEAM’s three synergistic mechanisms (channel-grouped decoupling, cross-channel attention enhancement, spatial weighting reconstruction) and clarified the corresponding relationship between each mechanism and “alleviating local feature imbalance”. Meanwhile, in the ablation experiments, compared with the original baseline model YOLO11S, although the evaluation metrics showed a slight improvement, this also indirectly verified the module’s effect in enhancing the features of unoccluded regions and compensating for the information in occluded regions. We greatly appreciate your critical suggestions! Your focus on “in-depth discussion of the rationality of C3K2_UIB’s significant parameter increase” and “comparison with simple feature enhancement modules to strengthen the argument” has accurately identified the key analytical dimensions that need supplementation in our previous presentation, which is highly guiding for improving the rigor of our research argumentation. In response to your concerns, we have made targeted improvements to the revised content (Lines 285–330), with specific explanations as follows:Regarding the “in-depth discussion of C3K2_UIB’s 34% parameter increase”, we did not avoid this design trade-off, but analyzed it from the perspective of balancing “parameters-performance-complexity”: On one hand, we clarified that the parameter increase stems from the introduction of the “dynamic channel attention + inverted bottleneck architecture” in the UIB structure, which is a necessary design to address the core defects of the original C3K2 (weak channel feature extraction and insufficient global information acquisition). On the other hand, by integrating depthwise separable convolution, we controlled the computational complexity within a reasonable range—compared with similar improved modules (e.g., CBAM-C3K2), the computational complexity of C3K2_UIB is only 92% of that of CBAM-C3K2, achieving the optimization of “moderate parameter increase without synchronous complexity rise”.In response to your suggestion of “comparing with other simple feature enhancement modules”, we supplemented a horizontal comparison between C3K2_UIB and two mainstream simple modules (SE and CBAM): First, compared with the SE module (only 8% parameter increase), although the SE module has a lower parameter increment, it relies only on single-channel weight learning and thus struggles to capture global feature correlations, resulting in only an 0.8% improvement in AP@0.5 in multi-scale aerial scenarios. In contrast, through channel expansion of the inverted bottleneck architecture, C3K2_UIB increases the AP@0.5 improvement to over 1.5% within the controllable range of a 34% parameter increase, showing significantly better global information capture ability. Second, compared with the CBAM module (29% parameter increase), although CBAM integrates spatial-channel attention, it is susceptible to background interference in multi-scale occlusion scenarios and has higher computational complexity. C3K2_UIB, however, solves the adaptability problem in occlusion scenarios through “depthwise separable convolution + spatial feature recalibration” and reduces the computational complexity to 92% of that of CBAM-C3K2, verifying its advantage in “the effectiveness of performance improvement”. Through the above supplements, we further clarify that the parameter increase of C3K2_UIB is not a meaningless design, but an optimal trade-off between “solving the core defects of the original module” and “controlling resource consumption”. Moreover, compared with simple feature enhancement modules, C3K2_UIB has more competitive comprehensive performance in complex aerial scenarios. 3. Comments:Including models like Faster R-CNN, SSD, and RetinaNet on a modern small-target UAV dataset is almost a strawman argument. These models are known to perform poorly on this task. Their inclusion pads the comparison table but adds little scientific value. The paper should be compared against the most recent and best-performing models specifically designed for UAV small object detection from the last 1-2 years. The selection of baselines feels curated to make UAS-YOLO look good. 3.Reply:Dear Reviewer, Thank you very much for your valuable comment. We sincerely appreciate your valuable insight regarding the comparison models. We fully agree with your point that including models such as Faster R-CNN, SSD, and RetinaNet—known to perform poorly on small-target detection tasks in modern UAV datasets—results in a strawman argument, as their inclusion only pads the comparison table without adding substantial scientific value. In response to this comment, we have removed the experimental data of Faster R-CNN, SSD, and RetinaNet from Table 3 (Test results of different models on VisDrone2019) in the revised manuscript (Lines 466–467). This revision ensures the comparison table focuses on models with meaningful relevance to UAV small-target detection, thereby enhancing the rigor and scientific value of our comparative analysis. Attachments Attachment Submitted filename: Response-to-Reviewers_auresp_2.docx https://doi.org/10.1371/journal.pone.0337318.r004
6 Nov 2025 Decision Letter - Yaseen Ahmed Al-Mulla, Editor Aerial small target detection algorithm based on cross-scale separated attention PONE-D-25-44314R2 Dear Dr. Wang, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager® and clicking the ‘Update My Information' link at the top of the page. For questions related to billing, please contact billing support . If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Yaseen Al-Mulla Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: https://doi.org/10.1371/journal.pone.0337318.r005
Formally Accepted
Acceptance Letter - Yaseen Ahmed Al-Mulla, Editor PONE-D-25-44314R2 PLOS ONE Dear Dr. Wang, I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team. At this stage, our production department will prepare your paper for publication. This includes ensuring the following: * All references, tables, and figures are properly cited * All relevant supporting information is included in the manuscript submission, * There are no issues that prevent the paper from being properly typeset You will receive further instructions from the production team, including instructions on how to review your proof when it is ready. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few days to review your paper and let you know the next and final steps. Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. You will receive an invoice from PLOS for your publication fee after your manuscript has reached the completed accept phase. If you receive an email requesting payment before acceptance or for any other service, this may be a phishing scheme. Learn how to identify phishing emails and protect your accounts at https://explore.plos.org/phishing. If we can help with anything else, please email us at customercare@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Yaseen Ahmed Al-Mulla Academic Editor PLOS ONE https://doi.org/10.1371/journal.pone.0337318.r006

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .