Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

KAN-GLNet: An enhanced PointNet++ model for canola silique segmentation and counting

  • Jiajun Liu,

    Roles Conceptualization, Methodology, Validation, Visualization, Writing – original draft, Writing – review & editing

    Affiliation College of Information Engineering, Sichuan Agricultural University, Yaan, China

  • Bei Zhou ,

    Roles Supervision, Writing – review & editing

    12801@sicau.edu.cn

    Affiliations College of Information Engineering, Sichuan Agricultural University, Yaan, China, Sichuan Key Laboratory of Agricultural Information Engineering, Sichuan Agricultural University, Yaan, China

  • Jie Liu,

    Roles Data curation, Funding acquisition

    Affiliation College of Agronomy, Sichuan Agricultural University, Chengdu, China

  • Xike Zhang,

    Roles Validation

    Affiliation College of Information Engineering, Sichuan Agricultural University, Yaan, China

  • Jiangshu Wei,

    Roles Supervision

    Affiliations College of Information Engineering, Sichuan Agricultural University, Yaan, China, Sichuan Key Laboratory of Agricultural Information Engineering, Sichuan Agricultural University, Yaan, China

  • Yao Zhang,

    Roles Validation

    Affiliation College of Information Engineering, Sichuan Agricultural University, Yaan, China

  • Junjie Wu,

    Roles Project administration

    Affiliation College of Information Engineering, Sichuan Agricultural University, Yaan, China

  • Changping Wu,

    Roles Visualization

    Affiliation College of Agronomy, Sichuan Agricultural University, Chengdu, China

  • Di Hu

    Roles Investigation

    Affiliation College of Information Engineering, Sichuan Agricultural University, Yaan, China

Abstract

Accurate analysis of plant phenotypic traits is crucial for crop breeding and precision agriculture. This study proposes a lightweight semantic segmentation model named KAN-GLNet (Kolmogorov–Arnold Network with Global–Local Feature Modulation), based on an enhanced PointNet++ architecture and integrated with an optimized Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm, to achieve high-precision segmentation and automatic counting of canola siliques. A multi-view point cloud acquisition platform was built, and high-fidelity canola point clouds were reconstructed using Neural Radiance Fields (NeRF) technology. The proposed model includes three key modules: Reverse Bottleneck Kolmogorov–Arnold Network Convolution, a Global–Local Feature Modulation (GLFN) block, and a contrastive learning-based normalization module called ContraNorm. KAN-GLNet contains only 5.72M parameters and achieves 94.50% mIoU, 96.72% mAcc, and 97.77% OAcc in semantic segmentation tasks, outperforming all baseline models. In addition, the DBSCAN workflow was optimized, achieving a counting accuracy of 97.45% in the instance segmentation task. This method achieves an excellent balance between segmentation accuracy and model complexity, providing an efficient solution for high-throughput plant phenotyping. The code and dataset have been made publicly available at: https://anonymous.4open.science/r/KAN-GLNet-6432/.

Introduction

Plant phenotype, determined by the interaction between plants and their environment, serves as one of the key information sources for studying and managing plant growth and development [1]. As a globally important oil crop, rapeseed is widely cultivated due to its economic and ecological value [2]. Yield is primarily determined by silique number, seeds per silique, and thousand-seed weight [3], with silique number being the most critical factor. However, during the Silique maturation stage, the dense and overlapping distribution of siliques makes accurate identification extremely challenging. While deep learning models can handle this complexity, their high parameter count incurs substantial storage and computational costs, limiting high-throughput analysis. Traditional manual counting methods are inefficient and highly subjective. Therefore, achieving high-precision recognition with low-parameter models in complex plant scenarios is crucial for overcoming computational bottlenecks and advancing precision agriculture.

Traditional point cloud generation technologies each have distinct characteristics: LiDAR achieves millimeter-level precision but involves high hardware costs; depth cameras enable real-time acquisition but are susceptible to lighting interference; multi-view stereo (MVS) has low equipment costs but requires hours of computational processing. In contrast, Neural Radiance Fields (NeRF, [4]) represent a technological breakthrough through implicit neural representation—experiments demonstrate its reconstruction accuracy matches MVS levels [5]. The improved Nerfacto model [6] not only significantly reduces training time but also achieves superior geometric fidelity. Building on this, our study establishes the first fully annotated rapeseed NeRF dataset, providing an innovative high-precision, low-cost solution for 3D phenotyping of complex plant organs.

Accurate organ structure segmentation forms the foundation for quantifying morphological phenotypic parameters from 3D point cloud data. Traditional semantic segmentation methods employ geometric topological models, handcrafted features, and prior knowledge about plant objects to describe and distinguish different plant organs. For instance, Wu Sheng et al. [7] extracted maize skeletons across growth stages using Laplacian contraction combined with adaptive sampling, calculating plant height, leaf length, and inclination angles. Paloma Sodhi et al. [8] achieved precise sheath-stem segmentation and phenotyping in sorghum through multi-view 3D reconstruction and SVM classification with local-global feature fusion. Li Dawei et al. [9] proposed a region-growing segmentation method involving iterative PCA-based spatial feature calculation, supervoxel over-segmentation, and region growth for greenhouse plant leaf segmentation. However, these methods heavily rely on prior morphological knowledge, require tedious parameter tuning, are noise-sensitive, and have limited capability for analyzing complex structures and traits [1012].

To address these challenges, deep learning-based point cloud segmentation methods have emerged, automatically learning features through data-driven approaches. Early research primarily adopted voxelization to convert point clouds into structured data for feature extraction [13]. For example, Das Choudhury et al. [14] segmented maize stems and leaves using multi-view visual hull algorithms with voxel overlap verification and Euclidean clustering. Saeed et al. [15] implemented 3D segmentation of cotton stems, branches, and bolls via Point-Voxel Convolutional Neural Networks (PVCNN). However, these methods demand substantial computational resources and may incur information loss during segmentation. To overcome point cloud processing bottlenecks, subsequent research focused on direct point cloud processing architectures, mainly developing along two directions: point-based methods and Transformer-based approaches. The pioneering PointNet series [16,17] laid the groundwork for point cloud deep learning, providing crucial technical support for plant 3D phenotyping. For instance, Ao et al. [18] applied PointNet for maize stem-leaf separation using local point density. Guo et al. [19] integrated PointNet++ with ASAP attention modules, achieving 86% mIoU in cabbage segmentation. Addressing PointNet++ optimization needs, PointNeXt achieved performance breakthroughs through training strategy updates and model scaling, with Dong et al. [20] reporting 89.21% mIoU (sugarcane), 89.19% mIoU (maize), and 83.05% mIoU (tomato) stem-leaf segmentation at 6.08M parameters. While such point-based models excel in parameter efficiency, they often face accuracy limitations due to insufficient feature extraction. PCT [21] pioneered Transformer architecture for 3D point cloud processing, marking its first application in point cloud segmentation. For example, Ma et al. [22] proposed PSTNet with cascaded self-attention (PSA) and local feature aggregation (NPA), achieving 92.20% IoU for eggplant point clouds. Yang et al. [23] developed PACANet, a Transformer-based pairwise attention center axis aggregation network, achieving 92.46% mean accuracy for maize populations at 46.2M parameters. These studies demonstrate that while Transformer architectures deliver breakthrough performance, they often suffer from parameter explosion, severely limiting deployment in resource-constrained scenarios.

To address the contradiction between parameter quantity and segmentation accuracy in plant point cloud segmentation, this study proposes KAN-GLNet, a Kolmogorov-Arnold segmentation network equipped with global-local feature modulation. The model is designed to achieve high-precision segmentation under low-parameter constraints. The key innovations include:

(i) Constructing the first NeRF-derived rapeseed silique point cloud dataset (50 samples), expanding sample scale through data augmentation strategies.

(ii) Proposing reverse bottleneck KAN convolution (enhanced Kolmogorov-Arnold convolution) to replace conventional convolutions for strengthening feature extraction capability.

(iii) Designing GLFN multi-scale feature modulation blocks to optimize spatial representation of complex silique structures by fusing global-local attention mechanisms.

(iv) Introducing the ContraNorm contrastive normalization module to enhance point cloud feature stability and segmentation consistency.

(v) Proposing an optimized DBSCAN workflow algorithm to achieve automated silique instance counting.

Materials and methods

Overview

The overall process of canola silique segmentation and counting is shown in Fig 1, which consists of five stages: data acquisition, 3D point cloud reconstruction, data preprocessing and augmentation, canola silique segmentation, and silique counting.

thumbnail
Fig 1. Overview of canola silique segmentation and counting.

(A) Sparse reconstruction. (B) Dense reconstruction. (C) Point cloud preprocessing. (D) Point cloud augmentation. (E) Segmentation model prediction result. (F) Segmented Silique Point Cloud. (G) Silique instance clustering result.

https://doi.org/10.1371/journal.pone.0336622.g001

Data collection

The canola plants used in this study were cultivated in May 2024 at the No. 91 experimental field of Sichuan Agricultural University, Ya’an City, Sichuan Province, China, located at 30°N 103°E. To achieve high-throughput data acquisition, we employed a multi-angle photography approach, aiming to comprehensively capture the structural features of the canola plants [24]. As shown in Fig 2, the custom imaging platform consisted of four core components: a smartphone, two sets of supplemental lighting equipment, an electric turntable, and a black backdrop. Given the inherent limitations of smartphone image sensor sensitivity, the lighting system served a critical function. The turntable precisely controlled shooting angles while the backdrop effectively eliminated background interference.

Data collection employed an Apple iPhone 15 Pro Max smartphone featuring a 48-megapixel main camera with an f/1.78 aperture and 24mm focal length. The study involved standardized acquisition of 50 canola plants, each positioned centrally on a turntable rotating at a constant angular velocity of 6 degrees per second. The camera maintained a fixed horizontal distance of 50 centimeters from the subject, systematically capturing video recordings from two distinct perspectives: a horizontal view and a 60-degree downward-tilted view. Each perspective was recorded for one minute, yielding two minutes of total footage per plant. The description of acquisition parameters was inspired by [25], particularly in terms of presenting camera specifications and shooting settings. Complete parameter details are provided in Table 1.

thumbnail
Table 1. Rapeseed video acquisition and image processing data protocol.

https://doi.org/10.1371/journal.pone.0336622.t001

For image processing, FFmpeg software [26] extracted keyframes at one-second intervals, generating 120 frames per plant at a resolution of 3840 by 2160 pixels in JPG format, suitable for subsequent 3D reconstruction and analysis. The turntable’s continuous rotation protocol ensured optimal imaging stability while effectively preventing motion blur artifacts.

Point cloud generation and processing

NeRF is a technique that utilizes neural networks to represent scenes in three dimensions by learning volumetric representations from multi-view 2D images, thereby producing high-quality 3D reconstructions. The reconstruction process includes image acquisition, ray sampling, volume rendering, loss calculation and network optimization, volume density extraction, and point cloud generation, ultimately culminating in the completion of the 3D reconstruction.

NeRFStudio [27] is a framework designed for the creation, training, and deployment of NeRF models, aimed at simplifying the application process of NeRF technology, thus enabling researchers to utilize this technique more easily. Among these models, Nerfacto combines the advantages of traditional NeRF with the latest technological advancements, enhancing the speed, quality, and flexibility of 3D reconstruction. In this study, the Nerfacto model was employed to generate the point cloud data for canola. This method boasts a high reconstruction efficiency, with each object requiring approximately 20 minutes for reconstruction.

In the reconstruction process, images of the canola from multiple angles were loaded and synchronized. COLMAP [28] was used for feature matching and pose estimation to generate sparse point clouds and camera poses, which were then input into the Nerfacto model to create high-quality dense point clouds. These point clouds were saved in a P × 6 array, where P represents the total number of points, with columns for spatial coordinates (X, Y, Z) and corresponding RGB values, stored in PLY format. Fig 3 displays multiple sets of canola point cloud data generated by the Nerfacto model.

Nerfacto-generated point cloud models are highly dense, with each canola plant’s 3D model consisting of 200,000 to 300,000 points. These models initially included not only the canola plants but also trays, turntables, and significant noise points. As the main goal was to segment the canola siliques, the tray and turntable components were manually removed using CloudCompare [29], and noise was removed using radius-based and statistical methods due to the numerous outliers.

After denoising, a uniform downsampling method was applied to preserve the point cloud’s distribution characteristics and essential information [30]. The point cloud for each canola plant was downsampled to 20,000–30,000 points to reduce data volume and improve computational efficiency [31].

Methodology

Network overview

PointNet++, as a mainstream framework for 3D point cloud analysis, extracts local and global features through its Set Abstraction (SA) and Feature Propagation (FP) mechanisms. However, it has limitations in modeling long-range dependencies and insufficient global feature representation. To address these issues, this study proposes the KAN-GLNet model (Fig 4), which reconstructs the Min-PointNet module in the SA layer into KGL-PointNet, achieving improvements in three aspects: First, we propose reverse bottleneck KAN convolutions to replace some of the original convolutional layers, which achieves more efficient geometric feature extraction with fewer parameters compared to standard KAN convolutions [32]; second, a GLFN feature modulation block is designed, combining Global and Local Spatial Attention (GLS Attention) with a Partial Convolution Network (PCFN) to jointly optimize local details and global contextual representation; finally, the ContraNorm contrastive normalization module [33] is introduced, leveraging contrastive learning to constrain feature distributions, suppress noise, and mitigate dimensional collapse and over-smoothing during training. The input point cloud batch has a feature dimension of D, with K neighboring points selected for each sampled point and N total sampled points. The overall architecture of KGL-PointNet and its core components are shown in Fig 5.

thumbnail
Fig 5. The overall architecture of KGL-PointNet and its core components.

(A) The complete network structure of KGL-PointNet. (B) The multi-scale feature enhancement module, GLS Attention. (C) The feedforward network based on partial convolution, PCFN.

https://doi.org/10.1371/journal.pone.0336622.g005

Reverse bottleneck KAN convolutions

The recently proposed Kolmogorov-Arnold Networks (KANs) have emerged as a promising alternative to MLPs [34], prompting us to examine them closely. KANs are a type of neural network architecture inspired by the pioneering theories of Andrey Kolmogorov and Vladimir Arnold. Unlike traditional MLPs, KANs replace linear weights with learnable spline functions. The advantage of this approach is that it not only reduces the number of required parameters but also improves the generalization ability of the network to some extent.

Bodner et al. [32] proposed Kolmogorov-Arnold Convolutions, which are defined as follows. Let the input image be y, where , c denotes the number of input channels, and h and w are the height and width of the image, respectively. The Kolmogorov-Arnold Convolution with a kernel size of k is defined in Eq (1):

(1)

The function is a univariate nonlinear learnable function. Its specific form is given in Eqs (2) and (3), as defined in the original KANs paper [34]:

(2)(3)

Here, wb and ws are trainable weights used to control the overall scale of the function. The term represents a spline function, which by default is a linear combination of B-spline basis functions.

In [34], B-splines are employed to approximate smooth functions. However, during training, variables may fall outside the predefined domain, which requires rescaling the spline grid. Although this method is theoretically sound, computing B-spline basis functions and rescaling the grid can introduce computational inefficiencies in KAN-based networks. To address this issue, Li [35] proposed FastKAN, which uses Gaussian radial basis functions (RBFs) to approximate the B-spline basis and thus accelerates model training. The Gaussian RBF is defined in Eq (4):

(4)

where r denotes the radial distance and is the standard deviation controlling the width of the Gaussian function.

However, although FastKAN accelerates model training, it does not solve the issue of KANs’ reliance on a large number of parameters during training, which still results in high training costs and increases the likelihood of overfitting. The problem of large convolutional parameter counts in KANs mainly arises in the spline part. Simply replacing the basis functions introduces a significant number of parameters into the model.

To overcome the limitations of existing methods, we propose Reverse Bottleneck KAN, which uses Gaussian kernel RBFs to approximate the B-spline basis. Unlike FastKAN, we remove the direct large-kernel convolution on the spline and adopt a reverse bottleneck structure. Specifically, this method first expands the dimensionality of the input data using a convolution with a kernel size of 1, then applies the spline, and finally reduces the dimensionality with another convolution of kernel size 1. This design, similar to a single-layer encoding-decoding process, improves feature representation while reducing the number of parameters.The network architectures of Kolmogorov-Arnold, FastKAN, and Reverse Bottleneck KAN Convolutions are shown in Fig 6.

thumbnail
Fig 6. Network architecture diagrams of Kolmogorov-Arnold, FastKAN, and reverse bottleneck KAN convolutions.

(A) Kolmogorov-Arnold and FastKAN Convolutions. (B) Reverse Bottleneck KAN Convolutions.

https://doi.org/10.1371/journal.pone.0336622.g006

GLFN feature modulation block

Although Reverse Bottleneck KAN Convolutions can extract initial features and generate basic representations from raw input, these features often struggle to adequately and finely represent key information in the point cloud, especially in terms of detail and precision. To further enhance the network’s ability to extract key features, we draw on the attention mechanism from [36] and the feature fusion method from [37], applying them innovatively to the 3D point cloud domain. To achieve this, we design the GLFN feature modulation module, designed to enhance feature extraction after the Reverse Bottleneck KAN Convolutions.

The GLFN module consists of the GLS Attention module and PCFN. The GLS Attention module enhances multi-scale feature representation, while the PCFN refines and denoises features using partial convolution, integrating local and global information. The module takes the output from Reverse Bottleneck KAN Convolutions as input and enhances the network’s ability to capture local geometry and global context through multi-level feature modulation. This design improves the expression of local details and global features, significantly boosting the overall performance of the network.

GLS attention.

The attention mechanism enhances relevant information while suppressing irrelevant details. The proposed GLS Attention module consists of two components: Global Spatial Attention (GS Attention) and Local Spatial Attention (LS Attention), which collaboratively enhance features at different spatial scales.

Given a feature tensor F with D channels, we first split it into two parts, F1 and F2, as shown in Eq (5). These two tensors are then passed through GS and LS Attention modules, respectively. Their outputs are concatenated and fused via a convolution, as defined in Eq (6):

(5)(6)

Here, and denote the Global and Local Spatial Attention modules, respectively, and is the final output feature after fusion via 1 × 1 convolution. The detailed structures of the GS Attention and LS Attention modules are illustrated in Fig 7.

thumbnail
Fig 7. Network architecture diagrams of the GS attention module and the LS attention module.

(A) GS Attention module structure. (B) LS Attention module structure.

https://doi.org/10.1371/journal.pone.0336622.g007

(1) GS Attention Module: The GS Attention module captures long-range dependencies between points in the spatial dimension of point clouds, thereby complementing the local spatial attention. Prior studies [38,39] have shown that such long-range interactions significantly enhance feature representation. The global spatial attention is computed using the feature tensor F1, and the corresponding operations are defined in Eqs (7) and (8):

(7)(8)

Here, denotes the global spatial attention operator, is the 1 × 1 convolution, indicates matrix multiplication, represents transposition, and includes two fully connected layers, a ReLU activation, and a normalization layer.

(2) LS Attention Module: The LS Attention module enhances local region features in point clouds, especially beneficial for highlighting small objects. We compute the local spatial attention and apply it to the tensor F2, as shown in Eqs (9) and (10):

(9)(10)

In this context, is a cascaded block composed of three 1 × 1 convolution layers and one 3 × 3 depthwise convolution. The operator denotes element-wise multiplication, and represents local spatial attention. This design effectively captures fine-grained local features with minimal parameters.

PCFN module.

Although the GLS Attention module enhances multi-scale features via global and local spatial attention, it may fail to fully integrate the relationships between them, potentially leading to weak representation and sensitivity to noise. To address this, we introduce the PCFN module—an efficient part-convolutional feedforward network aimed at refining features, reducing noise, and balancing local-global integration.

As illustrated in Fig 7(C), the PCFN module begins by receiving the fused feature , which is normalized and passed through a 1 × 1 convolution followed by a GELU activation. The resulting hidden representation is then split into two feature blocks , as defined in Eq (11). One of them, Fa, undergoes a convolution and GELU activation to encode local context. Finally, the outputs are concatenated and projected back to the original dimension through another 1 × 1 convolution, as described in Eq (12):

(11)(12)

Normalization

KAN-GLNet is influenced by the PointNet++ baseline model and applies Batch Normalization (BN) after the SA and FP layers to improve model stability, accelerate training, and prevent gradient vanishing or explosion. However, BN may lead to over-smoothing, causing the features to become similar and failing to retain the geometric differences between different point clouds, thereby reducing classification and segmentation accuracy. Moreover, since BN normalizes features only within each batch, it may lead to dimensional collapse, further limiting the network’s expressive power.

Inspired by [33], we adopt the contrastive learning-based normalization technique, ContraNorm, for 3D point cloud features. This method effectively mitigates issues such as dimensional collapse and over-smoothing, thereby enhancing the network’s capability to model complex geometric structures. The forward operation of ContraNorm is defined in Eq (13):

(13)

Here, Hb denotes the input feature matrix, and is its transpose. The scalar s represents the stride, and is a temperature parameter that controls the strength of the contrastive term. The Softmax function normalizes the similarity matrix , ensuring the similarity distribution is stable and bounded. The final output Ht is passed through a Layer Normalization (LN) to maintain numerical stability and scale consistency.

Canola silique recognition based on the DBSCAN algorithm

In plant phenotyping studies, applying clustering algorithms to the silique point clouds extracted after semantic segmentation is a common approach for achieving organ-level 3D point cloud recognition [40]. Elnashef et al. [41] employed the DBSCAN algorithm and utilized local point cloud density features to accomplish high-precision instance segmentation of the stem-leaf structures in dicotyledonous plants. Guo et al. [19], based on semantic segmentation of seedling-stage cabbage point clouds, combined DBSCAN with color filtering and edge filtering to successfully extract a variety of phenotypic traits.

However, directly applying the above clustering methods to canola silique point clouds still faces significant challenges. First, canola siliques often appear densely clustered, interlaced, and heavily overlapped, making it difficult to distinguish between adjacent clusters. Second, the semantic segmentation stage inevitably involves some misclassification between siliques and stems. These incorrect labels introduce additional noise into the clustering input, and outlier points further disrupt cluster connectivity.

Given that canola silique point clouds often contain outliers caused by segmentation errors, occlusions, or background interference, directly performing clustering tends to result in blurred or incorrect cluster boundaries. Therefore, to improve the stability and robustness of the clustering stage, we introduced Statistical Outlier Removal (SOR) filtering after semantic segmentation to clean the silique point cloud before applying DBSCAN clustering. This process effectively improves the quality of the point cloud prior to clustering and reduces the risk of outlier noise being misidentified as siliques.

Evaluation metrics

To evaluate the performance of our method in 3D semantic segmentation of canola siliques during the silique maturation stage, we adopted widely used metrics, including Overall Accuracy (OAcc), Class Accuracy (Acc), Mean Class Accuracy (mAcc), and Mean Intersection over Union (mIoU). These metrics assess the correctness and overlap between predicted and ground-truth point labels.

In addition, to evaluate the clustering results of individual siliques, we employed three metrics: Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and Counting Accuracy (CA), which measure the numerical deviation between predicted and actual instance counts.

The definitions and formulations of all these evaluation metrics are summarized in Table 2. Here, nij denotes the number of points of ground-truth class i predicted as class j, and C is the total number of semantic classes.

thumbnail
Table 2. Definitions of evaluation metrics used for semantic segmentation and clustering.

https://doi.org/10.1371/journal.pone.0336622.t002

Experiments

The difficulty of acquiring 3D point cloud data is significantly higher than that of 2D images, which is mainly reflected in two aspects: first, it requires specialized sensors and specific acquisition methods, and second, the complex structure of plants necessitates extensive manual annotation efforts [42]. Data augmentation techniques can effectively alleviate the issue of insufficient training data, providing ample data support for model optimization. The PointNext study confirmed the effectiveness of basic augmentation strategies such as rotation and translation in improving model performance. Yao et al. [43] employed random point deletion, noise addition, and scaling for point cloud data of tomato plants across three key growth stages (seedling, flowering, and fruiting). Xie et al. [44] successfully expanded the dataset size to 10 times its original scale by applying operations such as point cloud cropping and jittering to public datasets like Plant3D [45].

Based on previous research experience and computational efficiency considerations, this study adopts a rigorous data processing workflow: first, the NeRF canola dataset is divided into training, validation, and test sets at a ratio of 7:1:2, followed by 10-fold data augmentation for each subset. Splitting the dataset before performing augmentation is to mitigate potential data leakage risks. The specific augmentation strategy includes four dimensions: randomly discarding 0–30% of point cloud data, random rotation along the Z-axis between –180° and 180°, random scaling at a ratio of 0.9–1.1, and a 20% probability of flipping along the Y-axis.

The experiments in this study were conducted in an Ubuntu 20.04 operating system environment, with an NVIDIA RTX 3090 GPU as the hardware configuration. Detailed hardware and software specifications are shown in Table 3. During model training, the number of training epochs was set to 250 to ensure sufficient convergence and performance optimization. The batch size was set to 16, with an initial learning rate of 0.0001, which was dynamically adjusted using a cosine annealing schedule, with a minimum learning rate of 0.00001. The optimizer used was AdamW, with a weight decay coefficient of 0.01 to effectively alleviate overfitting. To improve training efficiency, each batch processed 4,096 sampled points.

Semantic segmentation results

KAN-GLNet exhibited a clear convergence trend in both training and validation losses. As shown in Fig 8, the loss dropped rapidly during the first 100 epochs, indicating a fast parameter optimization process. Subsequently, the loss curves gradually stabilized, with the validation loss reaching a low plateau while the training loss continued to decline slowly, ultimately achieving full convergence at epoch 250. Although there was a slight gap between the training and validation loss and accuracy curves, the difference was minimal, primarily due to the limited number of original training samples (only 45). Overall, even under the constraint of limited sample size, KAN-GLNet demonstrated strong accuracy and effectively mitigated the overfitting problem in canola organ segmentation.

We compared five representative state-of-the-art (SOTA) models for point cloud semantic segmentation: SPG [46], PointMetaBase [47], PointVector [48], PointNext [49], and PointNet++ [17]. Except for PointNet++, which has a relatively simple architecture, the other models have multiple variants. To fairly evaluate the upper-bound performance of each model, we used the best-performing official version for all comparisons. SPG adopts a dual-branch structure, with the main branch selectable among PointNet++, PTV1 [50], and PTV2 [51]. In this study, we selected the strongest version, PTV2, as the main branch. The parameter settings of all comparison models followed the recommendations of their original papers.

As shown in Table 4, KAN-GLNet achieved the highest segmentation accuracy among all compared models, demonstrating outstanding segmentation capability. Specifically, it achieved 94.50% mIoU, 96.72% mAcc, and 97.77% OAcc, with only 5.72M parameters, which are 4.46, 2.43, and 1.72 percentage points higher than those of the second-best model, SPG, respectively, while using only 50.6% of its parameter count. In addition, compared to other models such as PointMetaBase and PointVector, KAN-GLNet not only achieved significantly higher segmentation accuracy but also had a much smaller model size. Overall, KAN-GLNet achieves the highest segmentation accuracy with a relatively compact model, effectively addressing the trade-off between model size and accuracy in plant point cloud segmentation.

thumbnail
Table 4. A comparison of semantic segmentation performance across five networks, with the top results highlighted in bold.

https://doi.org/10.1371/journal.pone.0336622.t004

Fig 9 shows the visualized segmentation results of six models on the test dataset. Despite the dense spatial arrangement of siliques, KAN-GLNet was able to accurately distinguish individual siliques, especially showing strong performance in identifying the boundaries between siliques and stems. As an improved Transformer-based model, SPG also performed well in boundary distinction, but it often misclassified large regions of siliques or stems into the wrong category, leading to semantic confusion. PointMetaBase and PointVector exhibited similar performance. Their ability to distinguish boundaries in densely clustered siliques was limited, often misidentifying upper stems as siliques, although their accuracy improved in regions with sparse silique distribution. PointNext tended to misclassify stems as siliques over large areas in dense regions and also struggled with accuracy in sparse regions.

thumbnail
Fig 9. KAN-GLNet and baseline models segmentation visualization on test set.

https://doi.org/10.1371/journal.pone.0336622.g009

Canola silique clustering experiment

The DBSCAN algorithm relies on two critical hyperparameters. The neighborhood radius defines the maximum distance within which points are considered neighbors, while the minimum number of samples specifies the minimum number of neighboring points required for a point to be classified as a core point. In this study, we selected and as the final parameter combination for silique instance counting in canola plants.

We compared the clustering results with manual measurements to evaluate the accuracy of the improved DBSCAN algorithm in instance-level silique counting. As shown in Table 5, the silique counts for individual canola plants exhibited strong consistency. The overall evaluation metrics were as follows: a Mean Absolute Error (MAE) of 4.80, a Root Mean Square Error (RMSE) of 5.25, and a Counting Accuracy (CA) of 97.45%. These three metrics respectively validate the reliability of the clustering results in terms of error magnitude, prediction stability, and overall accuracy, indicating that the improved method can effectively reflect the actual number of plant organs.

thumbnail
Table 5. Comparison between the actual number and the detected number of canola siliques.

https://doi.org/10.1371/journal.pone.0336622.t005

Building on the original DBSCAN clustering process, we introduced SOR filtering to remove local noise introduced during the semantic segmentation stage, thereby improving the purity of the point cloud input. This enhancement significantly increases the robustness of DBSCAN in densely entangled regions, enabling it to accurately separate neighboring organs even under severe clustering, occlusion, and overlap of siliques. The results demonstrate that our proposed clustering pipeline is not only simple and efficient but also well-suited to the high-precision and high-adaptability demands of organ-level recognition in plant phenotyping.

Ablation experiment

In this study, we systematically evaluated the individual and combined effects of three key modules: Reverse Bottleneck KAN Convolutions, GLFN, and ContraNorm. These modules were integrated into the baseline PointNet++ model both independently and jointly for thorough validation. Table 6 presents a comparison of segmentation performance across different module combinations on the NeRF canola point cloud dataset.

thumbnail
Table 6. Ablation experiments on KAN-GLNet were conducted using different modules and combinations on the testing set.

https://doi.org/10.1371/journal.pone.0336622.t006

The experimental results show that introducing the Reverse Bottleneck KAN Convolutions leads to the most significant improvement in segmentation performance. This demonstrates that the module is highly effective in extracting spatial structural features, enabling the network to better recognize complex geometric boundaries such as the junctions between siliques and stems, while maintaining a relatively low number of parameters. The GLFN module enhances the model’s ability to perceive spatial details by fusing multi-scale global and local contextual information. ContraNorm improves the overall discriminative capability by introducing a contrastive learning mechanism that alleviates feature degradation and reduces overlap between different classes.

It is worth noting that when all three modules are combined to form the complete KAN-GLNet architecture, the model achieves the highest performance in terms of mean Intersection over Union and overall accuracy, indicating improved segmentation precision and stronger recognition of dominant structural components. However, the mean class accuracy is slightly lower than that of the model without the GLFN module. This is likely because the adaptive modulation in GLFN strengthens the fusion of global and local features, but at the same time introduces feature bias, which reduces the model’s attention to minority class samples and ultimately results in a slight drop in mean class accuracy.

Convolution comparison

In this section, we select PointNet++ as the baseline model. To reduce the spatial dimensionality of features, we replace the first convolution layer in the SA module with Kolmogorov–Arnold Convolutions, FastKAN Convolutions, and Reverse Bottleneck KAN Convolutions, with all convolution kernels set to size 3. Table 7 presents the comparison of segmentation accuracy and parameter count for these improved models on the NeRF canola point cloud dataset.

thumbnail
Table 7. Comparison of segmentation accuracy and parameter count between Kolmogorov-Arnold convolutions, FastKAN convolutions, reverse bottleneck KAN convolutions, and the baseline model.

https://doi.org/10.1371/journal.pone.0336622.t007

Experimental results show that introducing Kolmogorov–Arnold series convolutions significantly improves the model’s segmentation performance for canola siliques and stems. However, both Kolmogorov–Arnold Convolutions and replacing only the basis functions lead to a substantial increase in parameter count, limiting the model’s application in resource-constrained environments. Given that this study aims to maintain segmentation accuracy while minimizing model complexity, we designed and introduced Reverse Bottleneck KAN Convolutions. The results indicate that this convolution not only slightly outperforms the model using FastKAN Convolutions in segmentation accuracy but also significantly reduces the parameter count, demonstrating an excellent balance between low parameter count and high precision and showing stronger potential for practical applications.

Convolution replacement position

This paper introduces the Reverse Bottleneck KAN Convolution and applies it to the Min-PointNet part of the PointNet++ model by replacing the first convolutional layer, resulting in the design of KGL-PointNet. Table 8 presents the impact of replacing convolution layers at shallow, deep, and multiple levels on model performance and parameter count. Experimental results show that when the Reverse Bottleneck KAN Convolution is placed at the shallowest layer of the feature extraction network, the model achieves the highest mIoU, mAcc, and OAcc, with only a slight increase in parameters compared to replacing the second layer. This indicates that shallow layers are more effective at capturing low-level geometric features and local texture information in the point cloud, while deeper layers focus more on semantic-level representations, where the advantages of the Reverse Bottleneck KAN Convolution become less pronounced.

thumbnail
Table 8. Comparison of reverse bottleneck KAN convolutions replacing single-layer and multi-layer convolutions (Conv).

https://doi.org/10.1371/journal.pone.0336622.t008

Moreover, stacking multiple Reverse Bottleneck KAN Convolutions does not further improve segmentation performance and instead leads to a decline. The primary reason lies in the significant increase in parameter count, which introduces risks of overfitting and increased computational burden. On the one hand, a large number of parameters makes the network more prone to falling into local minima; on the other hand, excessive feature enhancement may result in unstable gradient updates or redundant information, thereby weakening the model’s generalization ability. Therefore, the optimal strategy is not to simply increase the number of modules but to analyze the importance of features at different layers in depth, adopting selective replacement or dynamic configuration methods to maximize the effectiveness of shallow-layer geometric feature extraction while maintaining model compactness.

DBSCAN optimal parameter search and SOR filter necessity

To determine the optimal DBSCAN parameter combination suitable for the clustering task of canola siliques, this study adopts a grid search method to systematically evaluate the impact of two key parameters—neighborhood radius () and minimum number of samples ()—on clustering performance. The parameter search range was set as and . As shown in Fig 10, the experimental results indicate that when the parameter combination is set to and , the optimized DBSCAN demonstrates the best clustering performance in regions where the density of siliques varies significantly.

thumbnail
Fig 10. DBSCAN hyperparameter search.

(A) Impact of varying parameters on clustering performance (ground truth silique count: 78). (B) Impact of varying parameters on clustering performance (ground truth silique count: 144).

https://doi.org/10.1371/journal.pone.0336622.g010

The study found that a smaller value effectively distinguishes neighboring silique clusters that are spatially close, thereby avoiding under-segmentation. However, due to uneven point cloud density, it may also lead to some siliques being misclassified as noise. On the other hand, a moderate value not only suppresses over-segmentation caused by noise or residual points from semantic segmentation but also avoids mistakenly discarding true silique points when the threshold is set too high.

Since raw silique point clouds often contain acquisition noise and segmentation errors, directly applying the standard DBSCAN tends to misclassify these outliers as independent clusters, resulting in the generation of false clusters and over-segmentation. This negatively impacts the stability and repeatability of subsequent instance counting and morphological analysis. As shown in Fig 11, standard DBSCAN often counts outliers as false clusters, whereas the improved method effectively addresses this issue.

thumbnail
Fig 11. Comparison of DBSCAN clustering with and without SOR filtering.

(A) Original silique labels. (B) Clustering result using DBSCAN. (C) Clustering result using DBSCAN with SOR filtering.

https://doi.org/10.1371/journal.pone.0336622.g011

By introducing statistical outlier removal (SOR) filtering into the clustering pipeline, we first eliminate the most sparsely distributed points caused by segmentation residuals and acquisition noise, thereby making silique cluster boundaries clearer and enhancing their connectivity. Subsequently, using the optimized parameters of and , clustering is performed on the remaining point cloud. This allows for an adaptive balance between intra-cluster connectivity and inter-cluster separation in both dense and sparse regions.

As a result, the generation of false small clusters is significantly reduced, and each real silique cluster can be accurately extracted, greatly enhancing the robustness and consistency of the clustering results. Ultimately, this improved pipeline proves particularly effective in extracting phenotypic parameters from plants with complex and overlapping structures such as fruits and leaves. It provides a reliable data foundation for high-throughput quantification of canola siliques, reduces the burden of manual correction, and offers solid algorithmic support for subsequent three-dimensional organ morphological studies.

Discussion

Multi-view point cloud acquisition and generation

We designed a dedicated 3D scanning system for plants aimed at efficiently acquiring multi-view point cloud data. The system fully considers the morphological characteristics of plants such as canola and supports continuous detail capture from multiple angles. Specifically, canola plants are placed on a low-speed rotating turntable, and the system uses fixed cameras to record video of the entire rotation process, avoiding mechanical interference and timing errors caused by frequent photographing. This method allows comprehensive acquisition of the plant’s 3D appearance without affecting its structure. Subsequently, the video is processed frame-by-frame using the ffmpeg tool to obtain continuous, complete, and multi-angle image sequences, providing sufficient data for point cloud reconstruction.

During the shooting process, the system employs stable and continuous lighting to eliminate shadow interference and maintain uniform illumination. At the same time, a solid color background is used to effectively reduce background noise, further enhancing image quality and laying a solid foundation for point cloud generation.

For point cloud generation, we adopted NeRF technology, which can rapidly generate point cloud data with clear structure and high accuracy. Combined with the front-end video acquisition and image processing workflow, NeRF significantly improves the efficiency and quality of reconstruction, providing high-quality foundational data for subsequent semantic segmentation and clustering analysis.

Analysis of experimental results

During the silique maturation stage, canola siliques are arranged in a disorderly and heavily overlapping manner, while the number of branches varies significantly, posing considerable challenges for segmentation. KAN-GLNet effectively addresses these difficulties, successfully achieving precise segmentation, and the subsequently optimized DBSCAN algorithm effectively resolves the issue of numerous silique outliers being regarded as false clusters, providing a solid foundation for accurate counting and phenotypic analysis.

In the field of plant point cloud semantic segmentation, commonly used models are based either on point-based architectures or transformer architectures, both of which struggle to achieve an excellent balance between model accuracy and parameter size. For example, Dong et al. [20] proposed a segmentation model with a relatively low parameter count of 6.08M, achieving stem and leaf segmentation accuracies of 89.21% mIoU on sugarcane, 89.19% mIoU on maize, and 83.05% mIoU on tomato, demonstrating the potential of lightweight design. However, the relatively lower segmentation accuracy limits the precision of subsequent fine-grained phenotypic extraction. On the other hand, Ma et al. [22] proposed PSTNet, a transformer-based model that achieved 92.20% IoU on eggplant point clouds, but its large parameter size results in high computational cost and deployment difficulty, which is unfavorable for resource-constrained practical scenarios.

Our proposed model KAN-GLNet achieved 94.50% mIoU, 96.72% mAcc, and 97.77% OAcc on the NeRF canola test set, significantly outperforming the second-place SPG model based on PTV2 in terms of accuracy. Meanwhile, our model has only 5.72M parameters, which is far fewer than most Transformer-based models, reflecting an excellent balance between model accuracy and parameter size. This not only ensures high segmentation accuracy but also greatly reduces the computational resource requirements of the model, providing an efficient point cloud processing solution for agricultural edge devices.

Limitations

In this study, we acquired point clouds of canola plants using a self-designed multi-view image acquisition platform and applied KAN-GLNet for segmentation and clustering to extract silique counts. However, this process still faces two major limitations that need to be addressed.

As shown in Fig 12, canola plants growing naturally in field plots are dense and crowded, with severe silique overlap, making it difficult to capture complete multi-view images without damaging the overall plant structure. Although we transplanted individual plants into pots for imaging and used a low-speed turntable and uniform lighting to ensure data quality to some extent, this process inevitably altered the natural growth state of branches and leaves. As a result, the acquired point clouds differ from those in real field conditions, affecting the accuracy of segmentation and clustering outcomes in reflecting phenotypic traits under natural environments. Future work could explore non-contact aerial or multi-angle scanning technologies such as drones or mobile robotic arms to minimize plant structure interference. At the same time, future work could also shift the focus from individual canola plants to large-scale field-level phenotypic estimation.

thumbnail
Fig 12. Canola experimental field at Sichuan Agricultural University, Ya’an City, Sichuan Province, China.

https://doi.org/10.1371/journal.pone.0336622.g012

Our designed KAN-GLNet achieved 94.50% segmentation accuracy on the NeRF canola dataset, a significant improvement over the baseline PointNet++, which achieved only 70.40%, and it also outperformed other SOTA models. However, this improvement came with a parameter increase from the baseline’s 1M to 5.7M, posing challenges for deployment on resource-constrained edge devices or in real-time processing scenarios. Future work could further investigate model compression and lightweight optimization techniques, such as weight quantization, channel pruning, or knowledge distillation, to reduce model complexity while maintaining segmentation performance, thereby providing a viable pathway for applications in agricultural drone inspection, intelligent greenhouses, and mobile terminals.

This study also has certain limitations in the application of data augmentation strategies. Although the static augmentation methods employed (such as rotation, scaling, and random dropout) have been widely validated in the field of point cloud processing and can effectively simulate real-world disturbances such as pose variations and occlusions during field data acquisition, the diversity of the generated data remains constrained by the predefined transformation space, making it difficult to cover the complex variations present in all real scenarios. Moreover, to evaluate the robustness of the model when facing input perturbations, this study also applied data augmentation to the validation and test sets. While this design improved the stability of statistical evaluation under small-sample conditions and allowed a sharper focus on answering the robustness question of “whether the model performs consistently under real disturbances,” the results may not be directly comparable to traditional generalization performance evaluations based solely on original data. Future work could further introduce dynamic augmentation techniques and also provide evaluation results based on the original test set, so as to more comprehensively reflect the performance of the model.

Conclusion

In summary, this study utilized NeRF technology to generate accurate point clouds of canola plants and proposed a novel method for canola point cloud segmentation and silique counting based on KAN-GLNet and an optimized DBSCAN algorithm.

KAN-GLNet was developed based on the PointNet++ model and includes three major improvements: Reverse Bottleneck KAN Convolution, the GLFN feature modulation block, and ContraNorm. The Reverse Bottleneck KAN Convolution is used to enhance feature extraction capability, the GLFN feature modulation block is designed to optimize the fusion of local and global information, and ContraNorm, based on contrastive learning, is introduced to prevent over-smoothing. Experimental results show that KAN-GLNet outperforms multiple advanced models in canola semantic segmentation tasks, achieving 94.50% mIoU, 96.72% mAcc, and 97.77% OAcc, with a model parameter size of only 5.72M. This demonstrates that KAN-GLNet achieves an excellent balance between low parameter size and high accuracy, showing strong practical potential and providing a feasible and efficient technical solution for applications in agriculture, especially in resource-constrained environments such as edge devices, drone inspections, and mobile terminals.

Future research will focus more on phenotypic studies of canola in open field conditions, aiming to achieve the transition and expansion from single-plant to population-scale analysis. Given the dense distribution of plants, severe silique occlusion, and complex lighting conditions in natural environments, subsequent work could explore non-contact high-throughput acquisition methods in complex field scenarios, such as integrating drone-based aerial photography and multi-angle robotic arm scanning to obtain large-scale point cloud data under real-world conditions. To further improve the real-time performance and accuracy of field data processing, it is necessary to optimize the model architecture, introduce more efficient lightweight strategies, or incorporate multimodal information such as RGB images and depth data to enhance the model’s robustness and generalization ability. In addition to silique counting, future work could be extended to the accurate extraction of more canola phenotypic traits, such as silique length, stem diameter, and plant height, providing more comprehensive data support for canola phenotyping research and intelligent breeding.

References

  1. 1. Gallinat AS, Ellwood ER, Heberling JM, Miller-Rushing AJ, Pearse WD, Primack RB. Macrophenology: Insights into the broad-scale patterns, drivers, and consequences of phenology. Am J Bot. 2021;108(11):2112–26. pmid:34755895
  2. 2. Tran DT, Hertog MLATM, Tran TLH, Quyen NT, Van de Poel B, Mata CI, et al. Population modeling approach to optimize crop harvest strategy. The case of field tomato. Front Plant Sci. 2017;8:608. pmid:28473843
  3. 3. Tang M, Tong C, Liang L, Du C, Zhao J, Xiao L, et al. A recessive high-density pod mutant resource of Brassica napus. Plant Sci. 2020;293:110411. pmid:32081260
  4. 4. Mildenhall B, Srinivasan PP, Tancik M, Barron JT, Ramamoorthi R, Ng R. Nerf: Representing scenes as neural radiance fields for view synthesis. Commun ACM. 2021;65(1):99–106.
  5. 5. Hu K, Ying W, Pan Y, Kang H, Chen C. High-fidelity 3D reconstruction of plants using neural radiance fields. Comput Electr Agric. 2024;220:108848.
  6. 6. Zhang X, Srinivasan PP, Deng B, Debevec P, Freeman WT, Barron JT. Nerfactor: Neural factorization of shape and reflectance under an unknown illumination. ACM Trans Graph. 2021;40(6):1–18.
  7. 7. Wu S, Wen W, Xiao B, Guo X, Du J, Wang C, et al. An accurate skeleton extraction approach from 3D point clouds of maize plants. Front Plant Sci. 2019;10:248. pmid:30899271
  8. 8. Sodhi P, Vijayarangan S, Wettergreen D. In-field segmentation and identification of plant structures using 3D imaging. In: 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS); 2017. p. 5180–7. https://doi.org/10.1109/iros.2017.8206407
  9. 9. Li D, Cao Y, Tang X-S, Yan S, Cai X. Leaf segmentation on dense plant point clouds with facet region growing. Sensors (Basel). 2018;18(11):3625. pmid:30366434
  10. 10. Paulus S, Schumann H, Kuhlmann H, Léon J. High-precision laser scanning system for capturing 3D plant architecture and analysing growth of cereal plants. Biosyst Eng. 2014;121:1–11.
  11. 11. Gibbs JA, Pound MP, French AP, Wells DM, Murchie EH, Pridmore TP. Active vision and surface reconstruction for 3D plant shoot modelling. IEEE/ACM Trans Comput Biol Bioinform. 2020;17(6):1907–17. pmid:31027044
  12. 12. Xiang L, Bao Y, Tang L, Ortiz D, Salas-Fernandez MG. Automated morphological traits extraction for sorghum plants via 3D point cloud data analysis. Comput Electr Agric. 2019;162:951–61.
  13. 13. Jing Huang, Suya You. Point cloud labeling using 3D convolutional neural network. In: 2016 23rd international conference on pattern recognition (ICPR); 2016. p. 2670–5. https://doi.org/10.1109/icpr.2016.7900038
  14. 14. Das Choudhury S, Maturu S, Samal A, Stoerger V, Awada T. Leveraging image analysis to compute 3D plant phenotypes based on voxel-grid plant reconstruction. Front Plant Sci. 2020;11:521431. pmid:33362806
  15. 15. Saeed F, Sun S, Rodriguez-Sanchez J, Snider J, Liu T, Li C. Cotton plant part 3D segmentation and architectural trait extraction using point voxel convolutional neural networks. Plant Methods. 2023;19(1):33. pmid:36991422
  16. 16. Qi CR, Su H, Mo K, Guibas LJ. Pointnet: Deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR); 2017. p. 652–60.
  17. 17. Qi CR, Yi L, Su H, Guibas LJ. Pointnet: Deep hierarchical feature learning on point sets in a metric space. Adv Neural Inform Process Syst. 2017;30.
  18. 18. Ao Z, Wu F, Hu S, Sun Y, Su Y, Guo Q, et al. Automatic segmentation of stem and leaf components and individual maize plants in field terrestrial LiDAR data using convolutional neural networks. Crop J. 2022;10(5):1239–50.
  19. 19. Guo R, Xie J, Zhu J, Cheng R, Zhang Y, Zhang X, et al. Improved 3D point cloud segmentation for accurate phenotypic analysis of cabbage plants using deep learning and clustering algorithms. Comput Electr Agric. 2023;211:108014.
  20. 20. Dong S, Fan X, Li X, Liang Y, Zhang M, Yao W, et al. Automatic 3D plant organ instance segmentation method based on PointNeXt and Quickshift++. Plant Phenom. 2025;7(3):100065.
  21. 21. Guo M-H, Cai J-X, Liu Z-N, Mu T-J, Martin RR, Hu S-M. PCT: Point cloud transformer. Comp Visual Med. 2021;7(2):187–99.
  22. 22. Ma L, Kong L, Peng X, Wang K, Geng N. PSTNet: Transformer for aggregating neighborhood features in 3D point cloud semantic segmentation of eggplant plants. Sci Horticult. 2024;331:113158.
  23. 23. Yang X, Miao T, Tao Y, Zhang B, Wu X, Han X, et al. PACANet: A paired-attention central axis aggregation network for plant population point cloud segmentation and phenotypic trait extraction—A case study on maize. Comput Electr Agric. 2025;237:110611.
  24. 24. Li D, Quan C, Song Z, Li X, Yu G, Li C, et al. High-throughput plant phenotyping platform (HT3P) as a novel tool for estimating agronomic traits from the lab to the field. Front Bioeng Biotechnol. 2021;8:623705. pmid:33520974
  25. 25. Paul A, Machavaram R, Ambuj , Kumar D, Nagar H. Smart solutions for capsicum harvesting: Unleashing the power of YOLO for detection, segmentation, growth stage classification, counting, and real-time mobile identification. Comput Electr Agric. 2024;219:108832.
  26. 26. Tomar S. Converting video formats with FFmpeg. Linux J. 2006;2006(146):10.
  27. 27. Tancik M, Weber E, Ng E, Li R, Yi B, Wang T, et al. Nerfstudio: A modular framework for neural radiance field development. In: ACM special interest group on computer GRAPHics and interactive techniques (SIGGRAPH) 2023 conference proceedings; 2023. p. 1–12.
  28. 28. Schonberger JL, Frahm JM. Structure-from-motion revisited. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR); 2016. p. 4104–13.
  29. 29. Girardeau-Montaut D. CloudCompare: 3D point cloud and mesh processing software; 2015. p. 197.
  30. 30. Marin D, He Z, Vajda P, Chatterjee P, Tsai S, Yang F, et al. Efficient segmentation: Learning downsampling near semantic boundaries. In: 2019 IEEE/CVF international conference on computer vision (ICCV); 2019. p. 2131–41. https://doi.org/10.1109/iccv.2019.00222
  31. 31. Yan X, Zheng C, Li Z, Wang S, Cui S. Pointasnl: Robust point clouds processing using nonlocal neural networks with adaptive sampling. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR); 2020. p. 5589–98.
  32. 32. Bodner AD, Tepsich AS, Spolski JN, Pourteau S. Convolutional Kolmogorov-Arnold networks; 2024. arXiv:240613155
  33. 33. Guo X, Wang Y, Du T, Wang Y. Contranorm: A contrastive learning perspective on oversmoothing and beyond; 2023. https://arxiv.org/abs/2303.06562
  34. 34. Liu Z, Wang Y, Vaidya S, Ruehle F, Halverson J, Soljačić M. Kan: Kolmogorov-arnold networks; 2024. arXiv:240419756
  35. 35. Li Z. Kolmogorov-Arnold networks are radial basis function networks; 2024. arXiv:240506721
  36. 36. Tang F, Xu Z, Huang Q, Wang J, Hou X, Su J, et al. DuAT: Dual-aggregation transformer network for medical image segmentation. In: Chinese conference on pattern recognition and computer vision (PRCV); 2023. p. 343–56.
  37. 37. Ren B, Li Y, Mehta N, Timofte R, Yu H, Wan C, et al. The ninth NTIRE 2024 efficient super-resolution challenge report. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR); 2024. p. 6595–31.
  38. 38. Bello I, Zoph B, Vaswani A, Shlens J, Le QV. Attention augmented convolutional networks. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV); 2019. p. 3286–95.
  39. 39. Cao Y, Xu J, Lin S, Wei F, Hu H. Gcnet: Non-local networks meet squeeze-excitation networks and beyond. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV) workshops; 2019.
  40. 40. Ferrara R, Virdis SGP, Ventura A, Ghisu T, Duce P, Pellizzaro G. An automated approach for wood-leaf separation from terrestrial LIDAR point clouds using the density based clustering algorithm DBSCAN. Agric Forest Meteorol. 2018;262:434–44.
  41. 41. Elnashef B, Filin S, Lati RN. Tensor-based classification and segmentation of three-dimensional point clouds for organ-level plant phenotyping and growth analysis. Comput Electr Agric. 2019;156:51–61.
  42. 42. Ma X, Zhu K, Guan H, Feng J, Yu S, Liu G. High-throughput phenotyping analysis of potted soybean plants using colorized depth images based on a proximal platform. Remote Sensing. 2019;11(9):1085.
  43. 43. Yao J, Gong Y, Xia Z, Nie P, Xu H, Zhang H, et al. Facility of tomato plant organ segmentation and phenotypic trait extraction via deep learning. Comput Electr Agric. 2025;231:109957.
  44. 44. Xie K, Cui C, Jiang X, Zhu J, Liu J, Du A, et al. Automated 3D segmentation of plant organs via the plant-MAE: A self-supervised learning framework. Plant Phenom. 2025;7(2):100049.
  45. 45. Conn A, Pedmale UV, Chory J, Stevens CF, Navlakha S. A statistical description of plant shoot architecture. Curr Biol. 2017;27(14):2078-2088.e3. pmid:28690115
  46. 46. Han J, Liu K, Li W, Chen G. Subspace prototype guidance for mitigating class imbalance in point cloud semantic segmentation. In: European conference on computer vision. Springer; 2024. p. 255–72
  47. 47. Lin H, Zheng X, Li L, Chao F, Wang S, Wang Y, et al. Meta architecture for point cloud analysis. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR); 2023. p. 17682–91.
  48. 48. Deng X, Zhang W, Ding Q, Zhang X. Pointvector: A vector representation in point cloud analysis. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR); 2023. p. 9455–65.
  49. 49. Qian G, Li Y, Peng H, Mai J, Hammoud H, Elhoseiny M, et al. Pointnext: Revisiting pointnet with improved training and scaling strategies. Adv Neural Inform Process Syst 2022;35:23192–204.
  50. 50. Zhao H, Jiang L, Jia J, Torr PH, Koltun V. Point transformer. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV); 2021. p. 16259–68.
  51. 51. Wu X, Lao Y, Jiang L, Liu X, Zhao H. Point transformer v2: Grouped vector attention and partition-based pooling. Adv Neural Inform Process Syst. 2022;35:33330–42.