The adaptive functional piecewise ordered weighted averaging method and its application to pollutant concentration analysis

Yang Li; Xiaoxue Hu; Maozai Tian

doi:10.1371/journal.pone.0342192

Abstract

The evolving patterns of pollutant concentrations and their rigorous assessment are critical issues in contemporary environmental research and policy-making, with important practical implications for air quality management and regional pollution control. To better support such decisions, scientifically sound multi-criteria ranking methods have become a key research focus. In this paper, we propose a novel adaptive functional piecewise ordered weighted averaging (FP-OWA) method for ranking complex functional data. The method extends the existing functional piecewise ranking–weighting framework by integrating data smoothing, depth-based centrality measures, and rank-based aggregation. We systematically compare the performance of FP-OWA with several existing functional data ranking methods using Monte Carlo simulations. The results show that FP-OWA substantially improves ranking consistency and stability when the data are contaminated by white noise. We further apply FP-OWA to rank the daily average PM2.5 and O₃ concentrations in 13 cities in the Beijing–Tianjin–Hebei region in 2023, accurately revealing the spatiotemporal differentiation patterns of regional pollution. These findings provide a solid technical basis for local governments to design pollution control strategies and improve air quality. Future research will focus on extending FP-OWA to highly nonlinear and complex functional data, further enhancing its computational efficiency to meet big-data processing requirements, and exploring additional application scenarios.

Citation: Li Y, Hu X, Tian M (2026) The adaptive functional piecewise ordered weighted averaging method and its application to pollutant concentration analysis. PLoS One 21(2): e0342192. https://doi.org/10.1371/journal.pone.0342192

Editor: Erfan Babaee Tirkolaee, Istinye University: Istinye Universitesi, TÜRKIYE

Received: July 30, 2025; Accepted: January 19, 2026; Published: February 13, 2026

Copyright: © 2026 Li et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The air quality data used in this study were originally obtained from the China air quality online monitoring and analysis platform (https://www.aqistudy.cn/historydata). However, as this platform does not provide a data download option, the final dataset was downloaded from the Environmental Research Database of the China Research Data Service (CNRDS) Platform (https://www.cnrds.com/Home/Index#/FinanceDatabase/DB/CEDS/ViewName/%E7%A9%BA%E6%B0%94%E8%B4%A8%E9%87%8F), which offers access to processed air quality data consistent with the official monitoring results.

Funding: The work was partially supported by the Beijing Natural Science Foundation “Theory, Methodology and Applications of Functional Hierarchical Quantile Regression Modeling” (No. 1242005); the Fundamental Research Funds for the Central Universities, and the Research Funds of Renmin University of China “Robust Statistical Inference for Complex Data” (No.25XNN015); Ministry of Education Humanities and Social Sciences Research General Project “Research on Spatial-temporal Quantile Regression Modeling: Theoretical Methods and Applications” (No.25YJA910005).

Competing interests: Authors have no competing interests.

1 Introduction

Air pollution is a major environmental problem that affects human health, ecosystems, and the sustainable development of economies and societies worldwide. As a key economic growth center and urban agglomeration in northern China, the Beijing–Tianjin–Hebei region has attracted considerable attention due to its high population density, complex pollution sources, and pronounced regional transport characteristics. The complexity of its air pollution is reflected in the overlapping of multiple emission sources, the heterogeneity of spatiotemporal distribution, and the influence of inter-regional transport. First, the Beijing–Tianjin–Hebei region is characterized by diverse pollution sources, including industrial emissions, traffic exhaust, coal combustion, dust, and volatile organic compounds, with the contribution of each source varying substantially across cities [1]. Moreover, accelerated regional integration has led to traffic congestion, surging energy consumption, and concentrated pollutant emissions, further highlighting the difficulty of managing the superposition of multiple pollution sources [2]. Second, air pollution in the Beijing–Tianjin–Hebei region exhibits distinctive seasonal and spatial patterns. Higher concentrations of PM2.5 and NO_x are observed in winter due to coal-fired heating and unfavorable meteorological conditions, while in summer, high temperatures and strong solar radiation promote photochemical reactions, exacerbating O₃ pollution [3]. Spatially, annual average PM2.5 concentrations in cities in southern Hebei Province (such as Shijiazhuang, Xingtai, and Handan) are significantly higher than in Beijing and Tianjin because their industrial structures remain heavily dominated by secondary industry, resulting in a pronounced “low in the north, high in the south” gradient [4,5]. Furthermore, as part of the North China Plain, the topography and meteorological conditions of the Beijing–Tianjin–Hebei region intensify cross-regional transport of pollutants [6,7].

However, when confronted with complex and diverse air pollution problems, traditional assessment methods such as principal component analysis (PCA) and the analytic hierarchy process (AHP) have limited ability to capture interaction effects and spatiotemporal variation among pollutants in dynamic, high-dimensional settings. In contrast, functional data analysis (FDA) offers a new framework for handling complex environmental data by transforming discrete observations into continuous functions [8]. FDA began to develop in the early 1990s. Ramsay and Silverman [9] systematically presented key methods such as smoothing techniques, functional principal component analysis (FPCA), and functional linear models, and established the theoretical foundation of the field. Their work details the use of spline functions and Fourier bases to smooth observations, effectively suppressing noise while preserving the continuity of the data. Ferraty and Vieu [10] made important contributions to nonparametric methods, opening new avenues for functional data classification and regression. Horváth and Kokoszka [11] focused on statistical inference for functional data, including hypothesis testing and the construction of confidence intervals.

Data ranking problems are a key challenge in FDA applications. Existing functional data ranking methods can be broadly categorized into three classes: embedding-based, distance-based, and depth-based approaches. Embedding-based ranking methods typically map high-dimensional functional data into a low-dimensional space to simplify computation and the ranking procedure. A classical example is the Fourier transform, which extracts frequency features by transforming data from the time or spatial domain to the frequency domain, thereby enabling effective ranking of functional data with periodic variation [12]. PCA, as another embedding technique, performs linear dimensionality reduction by extracting principal component directions. This approach reduces dimensionality while preserving the main information in the data, thus improving ranking efficiency. However, its inherent linearity may limit its effectiveness when the underlying data structure is strongly nonlinear [13].

Distance-based metrics play an important role in functional data ranking. The core idea is to obtain a ranking by quantifying the dissimilarity between functions. Commonly used metrics include the L₂ norm, Dynamic Time Warping (DTW), and the Fréchet distance. The L₂ norm, as a basic measure, computes the integral of the squared difference between functions. Although it is computationally simple and easy to interpret, it cannot accommodate time shifts and may therefore fail to capture key curve features when analyzing nonlinear functional data. By optimizing the alignment path between curves, DTW allows elastic scaling along the time axis and effectively addresses phase differences in time series [14]. It has been widely applied in fields such as speech recognition and data mining [15]. The Fréchet distance evaluates curve similarity from a geometric perspective, taking into account both spatial location and continuity, which makes it particularly suitable for analyzing the overall structural characteristics of functions [16]. Other metrics, such as the Hausdorff distance and Wasserstein distance, are also useful in specific scenarios, but they generally exhibit limited robustness to noise. Wang et al. [17] showed that although DTW is effective for nonlinear time series, noise contamination can easily cause instability in the resulting rankings.

To overcome the limitations of traditional distance-based metrics, ranking methods based on statistical depth have been developed. These approaches originate from multivariate statistical analysis, where Tukey’s [18] pioneering work on half-space depth initiated systematic research on depth notions, followed by variants such as simplex depth and projection depth. Considering the infinite-dimensional nature of functional data, López-Pintado and Romo [19] proposed the concept of band depth (BD). In this framework, the depth of a curve is determined by evaluating its “centrality” within the sample: specifically, by computing the probability that the target curve lies within the band formed by other curves. A larger depth value indicates that the curve is closer to the core of the data distribution and therefore more representative of the overall pattern. This property allows BD to maintain stable ranking performance even in the presence of noise or outliers [20], and it has been successfully applied to dynamic functional data in areas such as environmental science and financial analysis. For example, in air quality monitoring, identifying the most representative pollutant concentration trajectories provides a sound scientific basis for environmental policy-making [21]. The subsequent Modified Band Depth (MBD) further improves ranking accuracy by incorporating amplitude information. New depth notions, such as h-mode depth [22] and random Tukey depth [23], have also introduced additional perspectives to this research field. Although depth-based methods offer clear advantages in terms of robustness to noise and outliers, the choice of depth measures and the tuning of associated parameters still require further investigation. Sun and Genton [24] applied BD to construct functional boxplots, which not only facilitate data visualization but also provide an effective tool for anomaly detection, thereby highlighting the practical value of depth-based methods.

With its core advantage of modeling discrete observations as continuous functions, functional data analysis (FDA) has become a key tool for addressing complex spatiotemporal dynamics in environmental and climate research. In recent years, numerous innovative studies have emerged, including applications to fine particulate matter assessment and analyses of the relationship between carbon emissions and economic activity. King et al. [25], for example, developed an ST-FDA model to investigate the spatiotemporal variation of fine particulate matter components in the United States. Their approach treats annual concentration curves as functional time series, decomposes the covariance function via FPCA, and uses Kriging to predict concentrations at unobserved locations. Using IMPROVE and CSN network data from 2003 to 2015, they identified key patterns such as higher nitrate levels in urban than in rural areas, seasonal peaks in colder months, and an overall downward trend over time. In the climate–economy domain, Elayouty and Abou-Ali [26] applied FDA methods to assess the dynamic effects of electricity consumption and economic growth on CO₂ emissions from 1975 to 2014. They employed FPCA to extract the main modes of variation in emission trajectories, combined with functional linear regression to quantify the time-varying relationships among electricity consumption, economic growth, and CO₂ emissions. An EM-based clustering procedure was then used to group countries into five distinct emission pathways, providing a refined empirical basis for climate policy design. Although Notter [27] focused on life cycle impact assessment (LCIA) of particulate matter and did not explicitly adopt the FDA framework, the continuous modeling of physicochemical properties—such as particle size in the range and 34 chemical components—closely parallels FDA ideas. Through four modules (Fate, Exposure, Effect, and Damage), the study transformed particulate matter emissions into a continuous health-damage function, quantified the differentiated toxic effects of particle size and solubility, and addressed limitations in traditional LCIA methods. This provides valuable methodological guidance for future FDA applications to the continuous dynamic assessment of particulate matter toxicity. Akopov et al.’s [28] ecological–economic modeling of Armenian enterprises, in which indicators such as industrial output and emissions are treated as time-dependent variables, also offers inspiration for extending FDA to ecological economics. In future work, it would be promising to represent the dynamic relationship between enterprise emissions and output as functional trajectories using FDA, and to combine this with functional deep ranking and related methods to identify optimal transformation paths that balance ecological and economic benefits, thereby further enriching cross-disciplinary applications of FDA.

This paper tackles the problem of functional data ranking by proposing an innovative method that integrates the strengths of several existing techniques, with a particular focus on improving the stability and adaptability of ranking procedures. The main contributions of this study can be summarized in three aspects. First, leveraging the continuous nature of functional data, spline smoothing is applied to preprocess the original observations, effectively filtering out high-frequency noise. Second, a density-based DBSCAN clustering algorithm is introduced to partition the functional data into several sub-intervals, within which a rank-based strategy is implemented to substantially reduce the influence of abnormal observations on the final ranking. Third, the Modified Band Depth (MBD) is incorporated to quantify the centrality of each curve and construct segment-level weights, thereby enhancing the robustness of the ranking with respect to outliers.

The remainder of this paper is organized as follows. Sect 2 reviews existing functional ranking methods and presents the proposed FP-OWA approach. Sect 3 discusses the asymptotic properties of the proposed estimator. Sect 4 reports extensive numerical experiments that evaluate the performance of the method under different scenarios. Sect 5 applies FP-OWA to rank the 2023 daily average PM_2.5 and O₃ concentrations in 13 cities in the Beijing–Tianjin–Hebei region, providing a solid technical basis for formulating regional pollution control strategies and improving air quality. Finally, Sect 6 concludes the paper and outlines potential directions for future research.

2 Introduction to ranking methods

2.1 Existing functional ranking methods

FPCA is a dimensionality reduction technique for functional data that captures the main modes of variation over a continuous domain, thereby reducing dimensionality while preserving essential features. First, to reflect the continuous nature of the data, discrete observations are smoothed—typically using spline or kernel smoothing—to transform the observed points into continuous curves X_i(t). Next, the sample covariance function C(s,t) is estimated and subjected to eigen-decomposition to obtain eigenvalues and eigenfunctions . On the basis of , the cumulative proportion of variance is calculated to determine the number of principal components. The principal component scores for each sample curve are then computed using the eigenfunctions . Finally, the samples are ranked according to these scores. In most applications, the first principal component score , is used as the primary ranking criterion. When multiple principal components are used, a composite score , is constructed, where . FPCA can eliminate redundant dimensions and effectively capture dynamic variation patterns in functional data. However, it is sensitive to noise and outliers, the selection of eigenfunctions depends on accurate covariance estimation, and nonlinear dependencies cannot be directly accommodated.

WLR is a functional data analysis method for time series that transforms discrete observations into continuous curves and produces robust ranking results through segmentation and weighting [29]. Using piecewise linear interpolation. When handling missing values, if a missing value occurs at an intermediate time point, that point is skipped and the adjacent observed points are directly connected; if the missing value is located at either endpoint, multiple imputation is used to complete the data. Once the continuous function is obtained, the time series is divided into several segments. Suppose Y_i(t) is partitioned into msegments over the interval , with segment boundaries . The segment weight w_j is calculated according to the proportion of the length of the j-th segment relative to the total duration of the interval. To reduce the influence of outliers, observations within each segment are ranked. Specifically, in the j-th segment, let the observations of the i-th sample be for , the rank statistic R_ij is then computed for each sample, with average ranks assigned in the case of ties. Finally, a composite score for each sample is obtained using the segment weights w_j and rank values R_ij, expressed as . Through segmented processing, WLR partially preserves the continuity and temporal characteristics of the data, making it suitable for scenarios with relatively stable temporal dynamics. However, its ranking results can be sensitive to the choice of segment length and the number of observation points. In addition, WLR does not explicitly account for the central tendency of the data and remains relatively sensitive to outliers.

The h-mode depth, proposed by Cuevas et al. [30], is a depth measure based on nonparametric kernel density estimation. Unlike depth notions defined in terms of geometric position, h-mode depth focuses on the local probability density of curves in the function space. It can effectively identify the most densely populated regions of the distribution and is particularly suitable for capturing the central features of asymmetric or multimodal functional data. Let be functional observations on an interval T. For any function x(t) its h-mode depth is defined as

where is a kernel function and h is a bandwidth parameter controlling the neighborhood size. A curve with a larger value of D_h(x) has more neighboring sample curves in its vicinity and is therefore more representative of the overall distribution. Given the sample and the depth measure D_h, the ranking is obtained by ordering the sample curves in decreasing depth, that is, from the largest to the smallest value of D_h.

Tukey depth is a classical centrality measure in multivariate statistics, but its direct computation in infinite-dimensional function spaces faces severe computational complexity. To address this issue, Cuesta-Albertos et al. [23] proposed the random Tukey depth (RTD), which uses random projections to reduce high-dimensional calculations to a sequence of one-dimensional problems, thereby substantially lowering computational cost while preserving key statistical properties. Let x be the function to be evaluated, and let denote a set of unit projection directions drawn at random from a Sobolev space. For each direction u_j, one computes the univariate Tukey depth of the projection . The RTD of x is then defined as the minimum of these univariate depths

where D₁(y;Z) denotes the univariate Tukey depth. RTD is affine-invariant and highly sensitive to shape outliers. Its core idea follows the minimum-depth principle: if a curve appears as an outlier along at least one projection direction, it is regarded as an outlier in the overall functional space. The ranking rule induced by RTD is consistent with that of the h-MD described above.

2.2 Adaptive Functional Piecewise Ordered Weighted Averaging Method (FP-OWA)

Motivated by the limitations of the existing ranking methods, we propose an adaptive functional piecewise ordered weighted averaging (FP-OWA) procedure that combines spline smoothing, measures of data centrality, and band-depth–based weighting. The goal is to enhance the robustness and adaptability of functional rankings. The definition and computation of FP-OWA are described below.

Functional observations are typically recorded at discrete time points and are often contaminated by noise and missing values. Directly performing ranking analysis on such raw discrete data may deteriorate both the accuracy and stability of the results. To alleviate these issues, we first smooth the data and impute missing values so as to obtain continuous and stable sample trajectories. Assume that each sample satisfies

(1)

Where, is the observation of the i-th curve at time t_j, is the underlying true function value, and is random noise with and . To estimate f_i(t), we expand it in a spline basis,

(2)

Here, represents the basis functions, are the unknown coefficients, and is the number of basis functions.

Minimizing the residual sum of squares alone may lead to overfitting and spurious oscillations around noisy observations. We therefore introduce a roughness penalty to balance fidelity to the data and smoothness of the estimated function. The objective function for the i-th curve is

(3)

where the first term measures the data-fitting error and the second term penalizes curvature. The smoothing parameter λ controls the trade-off between these two components. In this study, λ is selected by generalized cross-validation (GCV),

(4)

and the optimal value is obtained by minimizing . Missing observations are imputed using spline interpolation, yielding a complete smoothed trajectory for each curve, which facilitates the subsequent segmentation step.

The smoothed functional data typically exhibit different local behaviours over the time domain. To accurately capture these local characteristics, we segment the time axis into several subintervals according to the similarity of temporal patterns, rather than directly working with the raw observation times. Since local density changes are difficult to detect in the original time scale, we transform the segmentation problem into a clustering problem in a multidimensional feature space and employ the density-based DBSCAN algorithm to automatically identify stable time intervals.

Unlike traditional segmentation methods that rely on geometric intersections, we define similarity among time points in terms of functional features. At each time, we consider not only the function value but also its first and second derivatives. An initial feature vector is constructed as . Where f(t), and are normalized function value, first derivative, and second derivative, respectively, and and are tuning constants. To remove potential multicollinearity and extract the main modes of variation, we standardize the feature matrix and perform principal component analysis (PCA), retaining the first qprincipal components as the final q-dimensional feature vector z_t. Each time point on the original axis is thus mapped to a point in feature space; if two time points are close in this space, it indicates that the values, rates of change, and curvature of all sample curves are highly consistent at these times, suggesting a locally stable regime.

DBSCAN clustering is then applied to the feature vectors z_t. The algorithm does not require the number of clusters to be specified in advance and instead classifies points into core points, border points, and noise according to local density. It involves two key parameters: the neighborhood radius ε(eps) and the minimum number of points in a neighborhood, minPts. To reduce subjectivity, we use a grid-search strategy rather than ad-hoc rules:

(1) For ε, we adopt the k-nearest-neighbor distance elbow method, computing for each feature point the Euclidean distance to its k-th nearest neighbor and setting the search range for to the 10–95% quantiles of these distances.

(2) For minPts, the search range covers values from a small sample size (twice the feature dimension [31]) to relatively large, stable values.

For each parameter pair(ε, minPts), DBSCAN is run and the average silhouette coefficient is computed,

Where is the within-cluster dissimilarity of time point t belonging to cluster C_i with members, is the Euclidean distance in feature space, and is the separation from the nearest neighboring cluster C_j. The parameter combination that maximizes SC is selected as optimal.

Using the optimal ε and minPts, we perform DBSCAN clustering on the standardized features. Each time point receives a cluster label (core, border, or noise), with noise typically labeled as 0. To preserve temporal continuity, noise labels are replaced by the labels of neighboring non-noise points. Along the time axis, a segmentation boundary is declared between two adjacent time points only if their (adjusted) cluster labels differ. The resulting breakpoints partition the entire time domain into m contiguous segments.

After segmentation, different segments may exhibit distinct patterns of functional variation. To quantify the representativeness of each segment, we employ the modified band depth (MBD) to measure how centrally a curve lies relative to all curves within a segment: the larger the MBD, the closer the curve is to the overall trend. For the i-th segment, let denote the sample curves restricted to this segment. The MBD of curve f_j(t) in segment i is

(5)

Among them, N_i is the number of curves in segment i, T_i is the time interval of that segment, is its length, and is the indicator function, which equals 1 if f_j(t) lies between f_k(t) and f_l(t) at time t and 0 otherwise.

The average MBD score within segment i is

(6)

and the segment weights are obtained by normalizing these averages,

(7)

Thus, segments whose curves are more centrally located in the sense of MBD receive larger weights in the final ranking.

Within each segment, the integral of a curve reflects its cumulative behavior over that time interval. Because raw integral values may be affected by scale and numerical range, we convert them into ranks to mitigate the influence of magnitude differences and extreme values.

For the j-th curve in segment i, in segment i, we define the segment-wise integral as

(8)

Where is the time interval of segment i. Using the spline expansion of f_j(t) the integral can be written as

(9)

After computing I_ij for all curves in segment i, we rank these values in ascending order and obtain the rank statistic R_ij, which represents the relative position of curve j within that segment. Ties are handled by assigning average ranks.

Finally, to derive a global score for each curve, we aggregate the segment-wise ranks using the MBD-based weights

(10)

where a large Score_j indicates that curve j tends to occupy better positions across the segments. These overall scores induce the final FP-OWA ranking of the functional data.

2.3 Time complexity of ranking methods

To comprehensively assess the computational feasibility of the FP-OWA method, as well as FPCA, WLR, h-MD, and RTD methods, particularly in scenarios where the sample size N increases significantly or the number of time points per sample T grows substantially, this section discusses the asymptotic time complexity of the aforementioned methods. Suppose the dataset consists of N independent samples, and each sample has function data observed at T time points. Additionally, we assume all methods are implemented in a standard computational environment and do not account for constant factors or lower-order terms. Furthermore, for methods involving segmentation, we introduce the average number of segments S to quantify the impact of segmentation on overall complexity.

For the FPCA method, the computational procedure can be decomposed into several main steps. First, data smoothing is performed using spline basis functions [26], with a computational complexity of . Second, estimating the covariance matrix requires computing all pairwise cross-products among the samples, which has complexity O(NT²). Finally, an eigenvalue decomposition is carried out on a covariance matrix, where standard algorithms have a worst-case complexity of O(T³) [32]. Therefore, the overall time complexity of FPCA is . The method is highly sensitive to increases in T, since the cubic term dominates the computational cost. However, when T is fixed and N increases, the method remains relatively efficient. It may nonetheless become a bottleneck for long time-series data, and is thus more suitable for applications with time series of moderate length.

For the WLR method, the computation starts with forming continuous curves, a step that involves simple interpolation and has complexity O(NT). Its segmentation mechanism is based on detecting intersections between curves: for each of the curve pairs, T time points are examined and sign changes are computed, giving a complexity of O(N²T). The subsequent steps of sorting intersections, removing duplicates, and generating segment boundaries are, in the worst case, still dominated by O(N²T). For each segment, sorting is performed based on the mean of N samples, with complexity . The weighted scoring part has complexity O(NS). Therefore, the overall complexity of the WLR method is . When S is small, lower-order terms can be neglected and the complexity simplifies to , indicating that WLR is computationally inefficient in scenarios where N is large.

For the h-MD method, the core computation lies in constructing the pairwise distance matrix between samples. According to the definition given earlier, computing the L²-norm distance between any two samples X_i and X_j requires traversing T time points, so the complexity of a single distance calculation is O(T). To obtain the depth ranking of all samples, an pairwise distance matrix must be computed. Therefore, the overall time complexity of h-MD is O(N²T). This method is quite sensitive to increases in the sample size N, when N is large, its computational efficiency becomes relatively low.

For the RTD method, the randomized Tukey depth reduces dimensionality through projections, and its computation consists of two parts: projection and one-dimensional ordering. Let V be the number of random projection directions, and project the N samples onto these V directions. Each projection involves computing an inner product between T-dimensional vectors, so the complexity of a single projection is, and O(T) the total projection cost is O(VNT). For each projection direction, the univariate Tukey depth of the N projected points is computed. This typically requires sorting the projected values, and standard sorting algorithms have complexity . Over V directions, this step has a total complexity of . Therefore, the overall time complexity of RTD is . Since V is treated as a constant parameter, the complexity of RTD is mainly dominated by the NT term. Compared with h-MD, RTD offers a clear computational advantage in scenarios where N is large.

For the proposed FP-OWA method, the computational pipeline proceeds as follows. First, in the spline smoothing step we use GCV to select the smoothing parameter λ [33]. The overall complexity of this step is . Since PCA is then performed in a low-dimensional feature space, its cost O(T) is comparatively negligible. Second, DBSCAN clustering is applied to the feature sequence at T time points. With spatial indexing, the average-case complexity is , for dense data, however, the complexity degrades to O(T²) [28]. Thus, this step in total costs . As it is computed only once, it usually does not dominate the overall runtime, but it does introduce variability in the number of segments S. Subsequently, the MBD weighting step computes the pairwise band depth of the N curves within each segment. This requires exhaustive checking at L grid points(), with complexity [19]. In this paper, we instead adopt a fast MBD variant based on stack-ranking [34], which reduces the per-time-point cost from quadratic to quasi-linear, . Consequently, over all S segments, computing the depth weights and performing integration and ranking has an overall complexity of . Meanwhile, variability in the DBSCAN-based segmentation leads to a U-shaped runtime curve: a larger S increases the ranking overhead, but at the same time shortens L and thereby reduces the MBD cost within each segment, which can ultimately improve overall efficiency.

3 Asymptotic property

In this section, we establish the asymptotic properties of the proposed adaptive functional piecewise ranking–weighting (FP-OWA) method through Theorems 1–5. We begin by stating the assumptions required for the theoretical analysis of the estimator.

A1: The true functions f_i(t) belong to the Sobolev space W^m,2[0,1] with , so that their derivatives up to order m are square–integrable, that is,

A2: The noise terms are i.i.d with and , and they have bounded moments.

A3: The observation points t_j lie in [0,1] and are either uniformly distributed or satisfy a minimum–spacing condition: there exists a constant c > 0 such that , where T is the number of observation points.

A4: For each true change-point of the standardized data, there exists a constant such that within a neighbourhood the density exhibits a jump, .

A5: Within each segment the functions are Hölder continuous: there exist constantst and L > 0 such that, for all t, s in the same segment . Finite jumps (discontinuities) are allowed at the segment boundaries, so that the smoothing procedure does not blur the change-points.

A6: The clustering parameters satisfy and , where ζ denotes the data dimension.

A7: The collection of functions {f_i} is bounded in a compact subset of the continuous function space, and the measure ψ is the Lebesgue measure.

A8: The band- depth kernel h(f,g) is bounded and Lipschitz continuous: there exists M > 0 such that and h satisfies a Lipschitz condition.

A9: For any and any segment , .

A10: The number of segmentations is finite , and the weight are positive and sum to one.

Note 1: Assumption A1 ensures that the roughness penalty term is well-defined and bounded. Assumption A2 provides the moment conditions needed to apply the law of large numbers and Bernstein-type inequalities. Assumption A3 guarantees that the condition number of the design matrix remains bounded, preventing ill-conditioning of the spline basis in regions with sparse observations. Assumption A4 is a regularity condition required for the convergence of DBSCAN. Assumption A5 enforces within-segment smoothness so that local oscillations do not interfere with density estimation. Assumption A6 specifies the asymptotic rate of the DBSCAN tuning parameters and ensures that adapts to the sample size, thereby avoiding both missed boundaries and unintended merging of segments. Assumption A7 restricts the function space so that the associated U-statistics are well-defined, while the use of the Lebesgue measure guarantees that the depth computation is measure-invariant. Assumption A8 provides variance control for the U-process and justifies the use of Hoeffding-type inequalities. Assumption A9 prevents discontinuities of the ranking functional without imposing parallel-curve assumptions and ensures the applicability of Slutsky’s lemma. Finally, Assumption A10 (finite segmentation) prevents divergence of infinite sums and guarantees the consistency of the aggregated score.

Theorem 1: Suppose Assumptions A1–A3 hold and consider the functional data model . Let be the estimator obtained by expanding in a spline basis with a roughness penalty. Then, as ,

where, n is the number of observation points and the smoothing parameter lambda satisfies and .

Theorem 1 shows that, under Assumptions A1–A3, the penalized spline estimator converges uniformly in probability to the true function f_i(t) as the number of observation points n increases. Thus, the smoothing step effectively removes noise without distorting the underlying functional trend, providing a sound basis for the accuracy of the subsequent ranking procedure.

Theorem 2: Suppose Assumptions A4–A6 hold. For the standardized functional data

let denote the set of true change-points. When the sample size is sufficiently large and the DBSCAN parameters ε and minPts are chosen appropriately, the set of estimated change-points obtained from the DBSCAN-based automatic segmentation satisfies, as ,

where H denotes the Hausdorff distance and δ is arbitrary.

Theorem 2 indicates that, under Assumptions A4–A6, the automatic segmentation produced by DBSCAN is consistent: the Hausdorff distance between the estimated and true sets of segment boundaries converges to zero as the sample size grows. This means that the segmentation accurately captures local structural changes in the data, thereby avoiding subjectivity and bias in determining the segment boundaries.

Theorem 3: Suppose Assumptions A7–A8 hold. For the collection of functions , the sample modified band depth MBD_n(f) is a uniformly strongly consistent estimator of the population band depth MBD(f), that is, as ,

where represents the function space under consideration.

Theorem 4: Suppose Assumptions A1–A10 hold. For each curve f_i(t), define as the integral of f_i(t) over the j-th segment. Let denote the true rank of f_i(t) on this segment, and let be the rank computed from the estimated function . Then, as ,

so that the estimated ranks converge to the true segment-wise ranks.

Theorem 5: Suppose Assumptions A1–A10 hold. Define the true overall score of curve f_i by and the estimated overall score by . Then, as ,

Moreover, the ranking induced by converge to the true induced by S(f_i).

Theorem 3 shows that, under Assumptions A1–A10, the estimator of the modified band depth (MBD) is consistent. Theorems 4 and 5 further establish that both the estimated within-segment ranks and the global aggregated scores converge to their true counterparts. Taken together, these results provide theoretical justification that, even in the presence of noise, missing values, or outliers, the FP-OWA method yields stable rankings that faithfully reflect the true relative ordering of the sample curves.

Note 2: Detailed proofs of Theorems 1–5 are provided in the S1 Appendix.

4 Simulation study

To evaluate the performance of the proposed FP-OWA method for ranking functional data, we conduct a Monte Carlo simulation study. Specifically, we perform 500 independent replications to compare five methods—FPCA, WLR, h-MD, RTD, and FP-OWA—in terms of ranking accuracy and robustness under different noise settings.

4.1 Generation of simulated data

The functional observations are generated according to

(11)

where denotes the slope of a randomly generated linear trend component, is a random phase shift of the periodic component, and represents the noise term. To mimic disturbances that arise in practical applications, we design five noise schemes based on this baseline model:

(1) Gaussian noise: Independent Gaussian perturbations with mean and standard deviations , 0.1, and 0.5are added to the simulated curves. This setting examines how well the methods adapt to conventional Gaussian disturbances.

(2) Spike noise: To model sporadic extreme measurements, we randomly select 1%, 10%, and 30% of the sampling points and inflate their original values by a factor of 5, thereby assessing the ability of the methods to handle abrupt outliers.

(3) Amplitude noise: Localized sharp oscillations are created by amplifying the signal magnitude at randomly chosen 30%, 50%, and 80% of the time points, which is used to test how the methods deal with pronounced local changes.

(4) Poisson noise: Poisson-distributed perturbations with rate parameters , 0.1 and 0.3 are superimposed to emulate discrete event noise, allowing us to evaluate robustness against low-frequency but high-intensity discrete shocks.

(5) Laplace noise: Heavy-tailed Laplace perturbations with mean 0 and scale parameters 0.01, 0.1, and 0.3 are added. Compared with Gaussian noise, the heavier tails provide a more stringent test of the methods’ tolerance to extreme values.

The original data-generating process, along with the five noise-contaminated variants, is visualized in Fig 1.

Download:

Fig 1. Schematic diagram of the original data and noise data.

https://doi.org/10.1371/journal.pone.0342192.g001

4.2 Evaluation criteria

To compare the ranking accuracy and robustness of the five methods—FPCA, WLR, h-MD, RTD and FP-OWA—we adopt the following three criteria:

(1) Kendall’s Tau Test: We compute Kendall’s tau by comparing all possible pairs of observations and evaluating the proportion of concordant versus discordant pairs:

(12)

where P is the number of concordant pairs, i.e., pairs (i, j), for which the relative order of the two observations is the same in both rankings, and Q is the number of discordant pairs, i.e., pairs for which the relative order is reversed. Values of τ closer to 1 indicate higher agreement between the rankings.

(2) Spearman’s Rho test: The original scores are first converted to ranks, and then the Pearson correlation coefficient between the two rank sequences is computed as

(13)

where d_i denotes the difference between the two ranks for observation i. Values of ρ closer to 1 correspond to stronger rank correlation.

(3) Mean Absolute Error (MAE): MAE is defined as the average absolute difference between the predicted scores and the true integral scores. Smaller MAE values indicate that the predicted scores are closer to the true scores and thus reflect better ranking accuracy.

(14)

4.3 Simulation results and analysis

For each simulated data set, we apply FPCA, WLR, h-MD, RTD and FP-OWA to obtain the corresponding rankings. The resulting rankings are then compared with those obtained from the data without added white noise, and the three evaluation criteria are computed. Tables 1–5 report the simulation results under Gaussian white noise, Spike noise, Amplitude noise, Poisson noise and Laplace noise, respectively.

Download:

Table 1. Performance of five methods under Gaussian white noise.

https://doi.org/10.1371/journal.pone.0342192.t001

Download:

Table 2. Performance of five methods under Spike noise.

https://doi.org/10.1371/journal.pone.0342192.t002

Download:

Table 3. Performance of five methods under Amplitude noise.

https://doi.org/10.1371/journal.pone.0342192.t003

Download:

Table 4. Performance of five methods under Poisson noise.

https://doi.org/10.1371/journal.pone.0342192.t004

Download:

Table 5. Performance of five methods under Laplace noise.

https://doi.org/10.1371/journal.pone.0342192.t005

As shown in Table 1, the FP-OWA method exhibits strong stability and high ranking consistency under all conditions. In particular, when the noise intensity varies, FP-OWA is able to maintain high accuracy and low error. In terms of overall performance, FP-OWA consistently achieves the highest values of Kendall’s tau and Spearman’s rho with relatively narrow confidence intervals, indicating a high level of agreement with the true ranking and excellent accuracy. At the same time, its run time is comparatively short, demonstrating a clear efficiency advantage over the other four methods. The WLR method attains medium levels of Kendall’s tau and Spearman’s rho, but its run time is substantially longer than that of FP-OWA and FPCA, and it increases sharply as the sample size grows, which is likely due to the high computational cost of its segmentation mechanism. The performance of FPCA in terms of Kendall’s tau and Spearman’s rho is relatively poor. A plausible explanation is that FPCA performs a global decomposition of variation, whereas in local integration tasks the weights of principal components should adapt to the underlying pattern of variation. In the present simulation design, multiple independent modes of variation are present, so ranking based solely on the first principal component score can be biased. For h-MD and RTD, the ranking consistency measures are close to zero. This is because both methods are designed to quantify centrality: curves with intermediate values attain the largest depth, while curves with the largest and smallest values lie on the periphery. As a result, depth-based methods cannot recover rankings that are defined purely in terms of magnitude. Overall, FP-OWA shows a clear advantage in ranking functional data contaminated by Gaussian white noise.

As shown in Table 2, the FP-OWA method still outperforms the other approaches when ranking functional data contaminated by spike noise. It is worth noting that although FPCA attains the highest rank-correlation coefficients under spike noise, its MAE is the largest among the five methods. This reflects the fact that, under the spike-noise design described earlier, the main differences among curves are captured in their overall amplitude, and FPCA is very effective at extracting such global shape features. However, while the first principal component can recover the relative ordering of the curves, it fails to accurately reproduce the underlying amplitude-based scores. In contrast, the proposed FP-OWA method achieves a more favorable balance: it maintains high rank consistency while attaining the lowest MAE among all competing methods. This indicates that FP-OWA not only preserves high ranking accuracy, but also provides more accurate numerical estimates. Combined with its relatively fast run time, FP-OWA is clearly superior to FPCA and the other four methods in terms of quantitative accuracy.

Amplitude noise differs from spike noise in that it perturbs the data amplitude more uniformly and is often used to mimic sensor instability or environmental interference. As shown in Table 3, the experimental results further confirm the overall superiority of the FP-OWA method. Compared with the WLR method based on local cross-information, the FPCA method based on variance maximization, and the depth-based h-MD and RTD methods, FP-OWA exhibits greater stability when dealing with large-amplitude perturbations in magnitude. More specifically, as the noise level increases to 0.8, FP-OWA not only maintains the lowest MAE, but also achieves higher ranking consistency measures than the other four methods. However, in contrast to the results under the other types of noise, Table 3 also shows that all methods perform relatively poorly in this setting. A plausible explanation is that the more uniformly distributed amplitude noise is intrinsically harder to handle than the abrupt spikes in spike noise. This suggests that future work could focus on further improving ranking methods specifically for such amplitude-noise scenarios.

As shown in Table 4, under Poisson noise the FPCA method performs poorly, as it cannot effectively accommodate the signal-dependent heteroscedasticity induced by the Poisson distribution; its ranking performance is markedly inferior to that of the h-MD and RTD methods and even breaks down in some scenarios. In contrast, FP-OWA exhibits excellent adaptivity, achieving the highest ranking concordance and the lowest mean absolute error across all noise levels. The advantage is particularly pronounced in terms of computational efficiency: when the sample size is n = 300 and the noise level is high, the WLR method is burdened by the computation of a large number of spurious crossing points, leading to a sharp increase in runtime, whereas FP-OWA, owing to its efficient feature aggregation algorithm, keeps the computation time below 0.3 seconds. This provides strong evidence for the stability of FP-OWA in handling Poisson-type functional data.

As shown in Table 5, under the experimental setting with Laplace noise contamination, the FP-OWA method demonstrates remarkable robustness for heavy-tailed data. The results indicate that the conventional FPCA method fails to accommodate the non-Gaussian characteristics of the data, leading to substantially lower ranking correlations. In contrast, benefiting from its distinctive aggregation mechanism, FP-OWA effectively suppresses the impact of impulsive noise and achieves the best ranking accuracy and the smallest MAE across all noise levels. More importantly, the local fluctuations induced by Laplace noise cause a dramatic increase in the computational complexity of the WLR method, whereas FP-OWA consistently maintains high computational efficiency, further confirming its effectiveness in handling non-Gaussian, heavy-tailed noise environments.

Taken together, the extensive simulation study under a variety of noise settings—including Gaussian, impulsive, amplitude, Poisson, and Laplace disturbances—demonstrates that the proposed FP-OWA method achieves substantially better overall performance than the prevailing approaches FPCA, WLR, h-MD, and RTD. Specifically, its advantages manifest in three main aspects. First, FP-OWA exhibits excellent ranking consistency and quantitative accuracy. Whether under standard Gaussian noise or more complex amplitude perturbations, FP-OWA consistently attains high ranking concordance and low MAE. In particular, in the presence of impulsive noise, FP-OWA simultaneously recovers the true ranking order and the underlying signal amplitudes with high precision, thereby overcoming the drawback of FPCA, which attains relatively high rank correlation at the cost of large estimation errors. At the same time, the experiments confirm that depth-based methods defined via centrality (h-MD, RTD) are not suitable for such amplitude-based ranking tasks. Second, FP-OWA is more robust to complex noise distributions. When confronted with the signal-dependent heteroscedasticity induced by Poisson noise and the heavy tails associated with Laplace noise, the variance-maximization-based FPCA method almost breaks down, whereas FP-OWA, leveraging its distinctive feature aggregation and smoothing mechanisms, effectively suppresses various forms of non-Gaussian noise and outliers, exhibiting strong adaptivity and stability across all complex noise settings. Finally, FP-OWA possesses superior computational scalability. In contrast to the WLR method, whose runtime grows almost exponentially when dealing with large sample sizes and high-frequency fluctuating noise due to the need to process a large number of local crossing points, FP-OWA consistently maintains high computational efficiency.

4.4 Sensitivity analysis

Considering that the performance of the method may depend on the two arguments ε and minPts of the DBSCAN clustering algorithm, this subsection uses the functional data generation process from Eq 11. Gaussian noise with and is added to simulate the perturbations in real data, and sensitivity analysis is conducted using the FP-OWA method. The k-NN distance plot is utilized to identify the elbow point, which helps in selecting the ε parameter. Furthermore, a sensitivity heatmap shows how the number of clusters changes with variations in ε and minPts, thereby guiding the selection of robust parameters.

In Fig 2, the x-axis represents the indices of all data points sorted in ascending order by their “4-nearest neighbor distance,” while the y-axis shows the distance from each data point to its 4th nearest neighbor. The curve exhibits an inflection point where the slope shifts from gradual to steep, serving as the natural boundary between “core points” and “non-core points.” From the figure, it can be seen that the elbow occurs around 2.739, providing a reference range for selecting the ε parameter. The goal of the elbow plot is to identify the global distance boundary for “core points,” focusing only on the point density. This characteristic is particularly crucial when dealing with short sequences, highly seasonal data, or data with holiday (weekend) effects. Short sequences often suffer from sparse data due to the small sample size, and the elbow plot’s distance threshold can help exclude spurious sparse points caused by insufficient samples. Highly seasonal or holiday-affected data tend to form locally dense clusters at specific time points, and the global distance threshold of the elbow plot provides a foundational reference for such non-uniform dense structures, preventing misidentification of core points due to local fluctuations. However, the DBSCAN clustering result is determined by both ε and minPts. Especially in the case of the above-mentioned data features, a single ε threshold is insufficient to adapt to complex structures. Therefore, further plotting of the DBSCAN sensitivity heatmap is necessary to assist in selecting the optimal combination of parameters.

Download:

Fig 2. k-NN distance graph.

https://doi.org/10.1371/journal.pone.0342192.g002

From Fig 3, we can observe the changes in the number of clusters mapped across ε and minPts. In the lower-parameter red region, a high number of clusters is observed. This combination can accurately split sub-clusters of different seasons when dealing with highly seasonal data, but caution must be taken to avoid spurious cluster splitting in short sequences caused by an excessively small minPts. In the higher-parameter blue region, clusters tend to merge, reducing noise interference in short sequences, but potentially masking local special clusters related to holiday effects. Specifically, when ε is in the range of 0.8-1.5 and minPts is between 0-5, the heatmap shows a yellow-red tone corresponding to 4-6 clusters. This parameter range is most suitable for adapting to three types of special data. For short sequences, a moderate ε and smaller minPts combination can capture a limited number of core clusters while ensuring an adequate sample size. For highly seasonal data, this range helps distinguish independent clusters for different seasons while avoiding excessive merging within a cycle. For holiday (weekend) effect data, this range allows special patterns before and after holidays to be identified as independent sub-clusters, preventing confusion with regular data. When ε exceeds 2.3 or minPts exceeds 20, the heatmap predominantly shows blue, with the number of clusters reduced to 0-2. An excessively large ε can cause cross-cycle cluster merging in seasonal data, masking cycle differences, while a too-stringent minPts may misclassify valid short-sequence samples or local dense points related to holiday effects as noise, ultimately leading to cluster merging or failure. The above sensitivity analysis not only validates the reference value of the elbow plot for selecting ε, but also systematically enumerates multiple parameter combinations, providing a complete argument chain from single-threshold reference to optimal multi-parameter coordination for DBSCAN parameter selection. This ensures that the clustering results are both aligned with the inherent density of the data and meet the requirements of reasonable cluster numbers and structures, while also adapting to the special data characteristics of short sequences, high seasonality, and holiday (weekend) effects.

Download:

Fig 3. DBSCAN sensitivity heatmap.

https://doi.org/10.1371/journal.pone.0342192.g003

4.5 Performance of the FP-OWA method under missing data

To validate the performance of the FP-OWA method in scenarios with missing data, we set n = 100 and t = 50 and adopt the functional data generation mechanism in Eq 11. Moderate levels of the aforementioned types of noise are added to mimic perturbations in real data. We then consider three missing-data mechanisms—missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR)—and, for each mechanism, impose three missingness levels of 5%, 10%, and 20%. The results, assessed using the Kendall’s tau criterion, are reported in Fig 4.

Download:

Fig 4. Performance of the FP-OWA method under different missing conditions and levels.

https://doi.org/10.1371/journal.pone.0342192.g004

As shown in Fig 4, regardless of the missing-data mechanism or the proportion of missingness, the method performs substantially better under Gaussian noise than under spike, amplitude, or Poisson noise, and the corresponding confidence intervals are noticeably narrower. From the perspective of missing-data mechanisms, the impact of the missing rate on Kendall’s tau is negligible under MCAR. For example, in the Gaussian-noise setting the Kendall’s tau values at 5%, 10%, and 20% missingness are 0.8360, 0.8376, and 0.8364, respectively, indicating that under completely random missingness the method can stably maintain ranking consistency. Under MAR, Kendall’s tau decreases slightly as the missing rate rises. In the spike-noise case, for instance, Kendall’s tau drops from 0.8121 at a 5% missing rate to 0.7893 at a 20% missing rate, reflecting the fact that MAR is related to the observed variables: the higher the missing rate, the more difficult it becomes to recover the lost information via interpolation, which in turn mildly affects ranking consistency. A similar pattern is observed under MNAR, where higher missing rates are also accompanied by lower Kendall’s tau values. For example, in the Laplace-noise setting, Kendall’s tau declines from 0.7827 at 5% missingness to 0.7742 at 20%, because MNAR is driven by unobserved characteristics and higher missingness leads to a greater loss of extreme values, slightly weakening ranking agreement. Overall, the spline smoothing component in the FP-OWA method effectively filters out noise and has little adverse impact on the segmentation and integration steps, yielding stable performance and rankings that remain highly consistent with the true order.

4.6 Ranking stability analysis

To assess the robustness of the proposed FP-OWA method and its variants for ranking functional data contaminated by outliers, we conduct further simulation studies. We adopt the functional data generation mechanism in Eq 11 and use the true ranking obtained from the integral of the underlying functions as the evaluation benchmark. The proportion of outliers is varied from 0% to 30%, and the perturbation magnitude is set to , and (with δ denoting the standard deviation). For each setting, a subset of samples is selected according to the specified proportion and contaminated with Gaussian noise of the corresponding amplitude to mimic data pollution. The performance of three methods—the baseline FP-OWA, the Huber-filtered FP-OWA [35], and the Hampel-filtered FP-OWA [36]—is evaluated using the Kendall’s tau statistic. The simulation results are visualized in Fig 5.

Download:

Fig 5. Analysis of ranking stability.

https://doi.org/10.1371/journal.pone.0342192.g005

From Fig 5, we can see that in functional data ranking tasks, the FP-OWA method with Huber or Hampel pre-filtering is markedly more robust to outliers than the baseline FP-OWA method. As both the proportion and the magnitude of outliers increase, the ranking stability of all methods declines, but the filtered versions are able to maintain a relatively high level of concordance even under severe contamination. Under the same outlier settings, all methods achieve higher Kendall’s tau values when n = 300 and t = 50 than when n = 100 and t = 30. This indicates that a larger sample size dilutes the influence of individual outliers, and a denser time grid captures the overall trend and periodic structure of the functional data more completely, thereby reducing the impact of local anomalies on the integral-based ranking. The FP-OWA variants are also more stable in the large-sample case, as the increased number of time points allows the filters to more accurately identify departures at abnormal time locations, further enhancing ranking stability. We define the method to “break down” when Kendall’s tau falls below 0.7, that is, when ranking stability deteriorates substantially. As shown in the figure, the effective breakdown point of the baseline FP-OWA is relatively low and strongly affected by sample size. When n = 100 and t = 30, the breakdown occurs when the outlier proportion is around 20%–30%; once this threshold is exceeded, Kendall’s tau quickly drops below 0.7. When n = 300 and t = 50, the breakdown point improves slightly to around 30%, but it still reflects limited resistance to outliers compared with the variant methods, suggesting that the baseline FP-OWA is fairly sensitive to high proportions or large magnitudes of contamination. By contrast, FP-OWA combined with Huber or Hampel filtering delays the breakdown point by roughly 10%–20% relative to the baseline FP-OWA. The plots further show that the Huber-based variant is more stable at moderate contamination levels, whereas the Hampel-based variant performs better when outliers are more extreme. In summary, when the data contain a high proportion of outliers, applying Huber or Hampel filtering to correct extreme deviations before ranking can effectively mitigate their influence on the integral-based scores. The resulting FP-OWA variants provide a more reliable solution for robust ranking of functional data.

4.7 Prediction

In this experiment, Monte Carlo simulations are used to systematically evaluate the predictive performance of the FP-OWA method under various noise-contamination scenarios, with the aim of verifying its robustness and stability in ranking complex functional data. The experimental design continues to adopt the functional data generation scheme in Eq 11, to which different types and levels of noise are added in order to mimic realistic contamination settings. We set n = 300, t = 100, and use Spearman’s rho as the primary metric to quantify the association between the predicted ranking and the true ranking. The experimental results are presented in Fig 6.

Download:

Fig 6. Predictive performance of the FP-OWA method under different noise contaminations.

https://doi.org/10.1371/journal.pone.0342192.g006

The predictive performance of the FP-OWA method is strongly affected by both the type and the level of noise, while its runtime remains stable between 0.07 and 0.10 seconds. This indicates high computational efficiency and makes the method suitable for ranking tasks on moderately sized functional datasets. As shown in Fig 6, FP-OWA performs particularly well at low noise levels. At moderate noise levels, Laplace and Gaussian noise mainly manifest as global fluctuations that can be effectively attenuated by spline smoothing, so FP-OWA adapts well to discrete and heavy-tailed distributions in these settings. By contrast, the extreme spikes or scale shifts induced by impulse and amplitude noise may distort the MBD weights and the segmentation procedure, leading to reduced ranking stability. When the noise level is high, the performance of FP-OWA deteriorates but still remains reasonably robust. Future work could consider incorporating more robust clustering strategies or combining FP-OWA with nonparametric smoothing techniques to further enhance its ability to suppress strong random noise.

5 Empirical data analysis

5.1 Sources of data

In this section, we use the 24-hour daily average concentrations of PM2.5 and O3 in 13 cities of the Beijing–Tianjin–Hebei region from 1 January 2023 to 31 December 2023 as the objects of study. The data are obtained from the China Air Quality Online Monitoring and Analysis Platform (https://www.aqistudy.cn/historydata/), which provides real-time monitoring information but does not support direct data downloads. Therefore, the dataset used in this paper is taken from the Environmental Research Database of the China Research Data Service(CNRDS) Platform (https://www.cnrds.com/Home/Index#/FinanceDatabase/DB/CEDS/ViewName/

From Fig 7, it can be seen that the temporal variation of PM2.5 concentrations in the Beijing–Tianjin–Hebei urban agglomeration exhibits a pronounced seasonal pattern. The main sources of PM2.5 include coal combustion, industrial emissions, fugitive dust, and secondary formation processes, and its concentration is strongly related to meteorological conditions and the regional energy consumption structure. In winter, PM2.5 levels rise sharply, with multiple peaks exceeding . By contrast, summer is the cleanest season, during which the PM2.5 concentrations in all 13 cities remain below in July. In spring, PM2.5 concentrations show intermittently high values and pronounced fluctuations. This is mainly due to frequent dust storms and the recovery of industrial production, which together lead to repeated pollution peaks; however, as precipitation increases and vegetation greens up, concentrations decline in the latter part of the season. In early autumn, meteorological conditions are generally favorable and the dispersion of pollutants is relatively strong. Later in the season, air temperature drops, temperature inversions begin to occur, and potential impacts from regional activities such as straw burning emerge, causing PM2.5 levels to gradually converge toward typical winter conditions. Overall, the seasonal pattern can be summarized as “high in winter and spring, low in summer and autumn”.

Download:

Fig 7. Temporal variation of PM2.5 concentrations in the Beijing–Tianjin–Hebei urban agglomeration.

https://doi.org/10.1371/journal.pone.0342192.g007

O₃ is mainly formed through photochemical reactions between volatile organic compounds (VOCs) and nitrogen oxides (NOx) under solar radiation, and its concentration is therefore closely linked to air temperature and irradiance. As shown in Fig 8, the O₃ concentration in the Beijing–Tianjin–Hebei urban agglomeration reaches its annual peak in summer; in some cities, daily O₃ levels exceed on a number of days in June and July. In contrast, winter is the season with the lowest O₃ concentrations, with most cities remaining below . During spring, O₃ exhibits a persistent upward trend, increasing from around at the beginning of the season to the high-value range in early summer. In autumn, O₃ concentrations gradually decline as temperature drops and solar radiation weakens, falling from high to low levels. Overall, the seasonal pattern can be summarized as “high in summer, low in winter, with spring and autumn as transition periods.” By comparing Figs 7 and 8, it is evident that PM2.5 and O₃ in the 13 cities of the Beijing–Tianjin–Hebei region exhibit a pronounced inverse relationship over the annual cycle. This opposite pattern persists across all seasons and clearly reflects the typical pollution differentiation in the region, characterized by “winter haze and summer photochemical ozone.”

Download:

Fig 8. Temporal variation of O₃ concentrations in the Beijing–Tianjin–Hebei urban agglomeration.

https://doi.org/10.1371/journal.pone.0342192.g008

In terms of inter-city differences, the daily mean PM2.5 levels clearly follow a south–north gradient, with higher values in the south and lower values in the north. As shown in Fig 9, cities such as Handan and Xingtai fall into the relatively high concentration range, whereas Chengde, Zhangjiakou, and Qinhuangdao exhibit markedly lower average PM2.5 levels. Together with Fig 7, it can be seen that even during the pollution season the peak PM2.5 concentrations in these northern cities mostly remain below , indicating comparatively lighter pollution.

Download:

Fig 9. Spatial pattern of average PM2.5 concentrations in the Beijing–Tianjin–Hebei urban agglomeration.

https://doi.org/10.1371/journal.pone.0342192.g009

The daily mean O₃ levels show a “high in inland plain cities, low in coastal cities” configuration. As seen in Fig 10, high values are concentrated in the central and southern plains of Hebei and in areas with intensive industrial activity, with Zhangjiakou, Hengshui, and Cangzhou being representative examples. Under strong solar radiation and high summer temperatures, O₃ formation in these cities is markedly more efficient than elsewhere, and their annual daily means rank among the highest in the region. Low values, by contrast, are mainly found in the northern mountainous areas of Hebei and the eastern coastal zone, typified by Chengde, Qinhuangdao, and Beijing, where O₃ concentrations remain low throughout the year. Overall, the region exhibits a pronounced spatial contrast between PM2.5 and O₃ concentration levels.

Download:

Fig 10. Spatial pattern of average O₃ concentrations in the Beijing–Tianjin–Hebei urban agglomeration.

https://doi.org/10.1371/journal.pone.0342192.g010

5.2 Application of the FP-OWA method

Using the FP-OWA method, we further analyzed the 2023 daily average concentrations of PM2.5 and O₃ for the 13 cities in the Beijing–Tianjin–Hebei region, and the results are as follows.

Fig 11 shows both the differentiated ranking of PM2.5 concentrations among cities and the seasonal contribution pattern for each city. From the perspective of ranking heterogeneity, Handan, Xingtai, and Hengshui occupy the top positions in terms of PM2.5 levels within the urban agglomeration. This is mainly because these three cities are traditional high–energy-consumption, high-emission centres, with large industrial and agricultural emission bases. Combined with stagnant winds and temperature inversions in the hinterland of the North China Plain, pollutants tend to accumulate. In addition, these cities lie at key nodes of the PM2.5 transport corridor in the Beijing–Tianjin–Hebei region, so inflow from upwind areas further elevates local concentrations, jointly supporting their high rankings. By contrast, Zhangjiakou, Chengde, and Beijing fall into the lower quantiles of the regional PM2.5 distribution. Zhangjiakou and Chengde benefit from the mountainous terrain in northern Hebei, which enhances vertical dispersion, and their economies are dominated by eco-tourism and light industries with relatively low emission intensity. Although Beijing is a large metropolis, long-term stringent air-pollution control has substantially reduced local emissions, and the city is frequently influenced by cleaner air masses from the north, leading to weaker contributions from regional transport. These factors together result in the relatively low ranking of PM2.5 concentrations in these cities.

Download:

Fig 11. Ranking of PM2.5 concentrations in the Beijing–Tianjin–Hebei urban agglomeration.

https://doi.org/10.1371/journal.pone.0342192.g011

To assess the reliability of the above ranking results, we adopt a bootstrap procedure. Specifically, we first extract the residuals from the smoothed functional data and perform 200 random resamples with replacement, which are then added back to the fitted signals to generate bootstrap datasets contaminated with random noise. For each bootstrap sample, the FP-OWA algorithm is rerun to obtain a new ranking. The distribution of the 200 bootstrap rankings is finally summarized by boxplots, as shown in Fig 12, providing a visual representation of the uncertainty range associated with the ranking results.

Download:

Fig 12. Robustness analysis of PM2.5 rankings.

https://doi.org/10.1371/journal.pone.0342192.g012

As seen in Fig 12, the overall structure exhibits a clear diagonal distribution, which demonstrates that the FP-OWA method has strong discriminative power and effectively captures the structural differences in the data rather than random noise. Specifically, the box plots for the cities at the extremes of the ranking—Handan (ranked first) and Zhangjiakou (ranked last)—are relatively narrow, indicating good stability. However, for cities in the middle range, such as Tianjin and Tangshan, the boxes are wider and overlap, which reflects the homogenized competition in PM2.5 pollution control levels in the central part of the Beijing–Tianjin–Hebei region. This suggests that the performance differences among these cities are not statistically significant.

Fig 13 summarizes the seasonal–monthly contribution patterns of PM2.5 in the Beijing–Tianjin–Hebei region and the segmentation structure obtained with the FP-OWA method. From the perspective of contribution weights, the MBD-based weights exhibit a clear U-shaped pattern: December and January are the months with the highest pollution contribution, with a monthly share typically exceeding 20%, far above the other months. This highlights a “winter–autumn dominated” pattern of PM2.5 pollution in the region. From the viewpoint of temporal segmentation, the DBSCAN clustering algorithm automatically divides the year into four regimes: a winter accumulation period, a spring dust-fluctuation period, a summer low-pollution stable period, and an autumn transition period. The segment boundaries align closely with key turning points in the PM2.5 concentration series, such as the onset of coal-fired heating in winter, the increase in precipitation in summer, and the emergence of temperature inversions in autumn. This not only captures the common features of regional pollution, but also confirms the rationality and adaptability of the automatic segmentation strategy embedded in the FP-OWA method. Combining Fig 11 with the seasonal contribution shares for each city reveals substantial differences in both the strength of seasonal dominance and the underlying driving mechanisms. For example, Handan, as a national core base of steel production capacity, has a relatively high share of annual primary particulate emissions from sintering and blast-furnace ironmaking during winter. Combined with low-level emissions from residential coal heating, this forms a dual driving mechanism of industrial and coal-combustion sources. Moreover, steel production in autumn does not undergo seasonal shutdown, so particulate emissions persist, and the high frequency of stagnant winds weakens dispersion, leading to a secondary peak in contribution. In Hengshui, the winter share of PM2.5 emissions is the highest in the region, reaching 51.6%. The city lies in the central plain, where winter stagnation is frequent and pollutants tend to be trapped in local recirculation. As a major agricultural city, autumn straw burning combined with chemical emissions pushes the autumn share up to 36.6%. In Beijing, by contrast, the winter and autumn shares of PM2.5 emissions are perfectly balanced at 34.5% each. Residential coal use has essentially been eliminated, winter emissions are dominated by traffic, and regional joint prevention and control measures help curb cross-boundary transport. There is no straw burning in autumn, and the main pollution source is inflow from surrounding areas, leading to the roughly symmetric winter–autumn contributions. Increased precipitation and dust retention by vegetation further reduce concentrations in spring. For Chengde and Zhangjiakou, the winter share is relatively high (50% and 44.4%, respectively), but the absolute concentration level is only about one third to one half of that in the high-pollution cities. Both cities are dominated by eco-tourism and light industry, with generally low industrial emission intensity. Although coal-fired heating is still used in winter, it is largely replaced by clean energy, and the mountainous terrain brings frequent cold-air activities and strong vertical dispersion of particulates. As a result, the winter contribution share is high, but the actual PM2.5 concentration remains relatively low.

Download:

Fig 13. Monthly contribution weights and temporal segmentation of PM2.5 in the Beijing–Tianjin–Hebei urban agglomeration.

https://doi.org/10.1371/journal.pone.0342192.g013

Unlike the spatial pattern of PM2.5, the ranking of O₃ concentrations in Fig 14 reveals two distinct clusters. Cangzhou, Hengshui, and Zhangjiakou form a high-O₃ cluster, mainly distributed across the central–southern plains and the transitional zone to northern Hebei. Chengde, Tangshan, and Qinhuangdao constitute a low-O₃ cluster, covering the northern mountainous areas, heavy-industry cities, and coastal zones. The key advantage of the high-value cluster lies in the ample supply of precursors and favorable photochemical conditions. For example, Cangzhou and Hengshui are major centres of chemical and agricultural industries, with large baseline emissions of VOCs and NOx; Zhangjiakou, although relatively weak in industrial emissions, experiences an early onset of photochemical activity in spring, and its local topography is conducive to O₃ accumulation. Together, these factors support their high rankings. By contrast, Chengde benefits from strong dispersion associated with its mountainous terrain and low precursor emissions, so O₃ is more easily diluted. Tangshan, despite strong photochemical activity in summer, has high NOx emissions that trigger the “NOx titration effect”, which consumes O₃. Qinhuangdao is influenced by a maritime climate, where lower temperatures and higher humidity suppress photochemical reactions. These factors jointly lead to relatively low O₃ concentrations in the low-value cluster.

Download:

Fig 14. Ranking of O₃ concentrations in the Beijing–Tianjin–Hebei urban agglomeration.

https://doi.org/10.1371/journal.pone.0342192.g014

Fig 15 displays the stability of O₃ rankings based on 200 bootstrap resamplings, using boxplots. From the figure, it can be seen that the overall ranking follows a clear stepped distribution, demonstrating that the FP-OWA method has strong discriminative power. Cangzhou, ranked at the top, and Chengde, ranked at the bottom, exhibit narrow confidence intervals, indicating that their rankings are statistically stable and not affected by data fluctuations. Cities in the middle of the ranking, such as Tianjin and Shijiazhuang, show considerable overlap in their boxplots, with wider spans. This objectively reflects the high homogeneity in O₃ governance levels among central cities in the Beijing–Tianjin–Hebei region, where competition is intense and performance differences are not statistically significant.

Download:

Fig 15. Robustness analysis of O₃ rankings.

https://doi.org/10.1371/journal.pone.0342192.g015

From the perspective of seasonal and monthly contribution patterns in Fig 16, the O₃ weights obtained by the MBD method exhibit an inverted U-shaped profile that is almost the opposite of that for PM2.5. March shows the largest contribution, accounting for about 21.9% of the annual total, and marks the transition from winter to spring in the Beijing–Tianjin–Hebei region, when the solar elevation angle rises rapidly and daylight hours increase markedly. June and July are the core months for O₃ pollution, with monthly weights as high as 15%–19%, indicating a pronounced seasonal concentration, while the contributions of the remaining months are relatively small, generally below 7%. The segmentation in Fig 16 is obtained automatically based on local data density, clearly distinguishing a summer photochemical peak period, a spring precursor accumulation period, a winter low-activity trough, and an autumn diffusion–transition period. These segments align closely with the key drivers of O₃ formation, providing further evidence for the effectiveness of the dynamic segmentation strategy in the FP-OWA method. Because O₃ formation depends on precursor supply and photochemical energy, combining Fig 14 with the seasonal contribution shares of each city suggests that the cities in the region can be grouped into four types: summer-dominated, spring–summer co-dominated, spring-dominated, and autumn-special-dominated. For summer-dominated cities, the defining feature is that summer O₃ accounts for more than 50% of the annual total, significantly higher than in other seasons, with Xingtai and Shijiazhuang as typical examples. Xingtai is a mixed industrial–agricultural city in southern Hebei, where VOCs emissions from chemical industrial parks are high in summer; combined with high temperatures and strong solar radiation, photochemical reaction rates are markedly higher than in spring. Shijiazhuang, the provincial capital, experiences the superposition of traffic NOx emissions and VOCs emissions from surrounding chemical industries. Located on the eastern foothills of the Taihang Mountains, the city is affected by föhn-like warming in summer, which further enhances photochemical activity, making summer the season with the largest contribution share. The spring–summer co-dominated type is characterized by a combined spring-plus-summer share exceeding 70%, indicating strong seasonal continuity. Cangzhou, a major base for petroleum and chlor-alkali industries, shows this pattern: springtime resumption of production boosts VOCs and NOx emissions, while in summer, high temperatures and strong radiation further increase VOCs emissions from chemical parks, and the high frequency of stagnant conditions over the plains favors O₃ accumulation; together, spring and summer account for 79.5% of the total. Hengshui, with its distinctive rubber, fiberglass, and agricultural sectors, also falls into this category. In spring, VOCs emissions from rubber vulcanization and NOx emissions from fertilizer application provide abundant precursors; in summer, industrial parks release large amounts of VOCs that efficiently react with traffic-related NOx, and the spring–summer share reaches 82.2%. Spring-dominated cities are those where the spring O₃ share exceeds that of summer. Zhangjiakou is a representative high-plateau city in northern Hebei. Rapid warming in spring triggers photochemical activity earlier than in surrounding areas, and the relatively enclosed topography of the Bashang Plateau leads to a high frequency of stagnant conditions, making local O₃ accumulation easier. Although summer temperatures are higher, evapotranspiration from grassland vegetation increases humidity and suppresses photochemical reactions, so the spring dominance is pronounced. Chengde, an ecological city in the northern mountains, also belongs to this type. Strong springtime insolation and photochemical activity are offset by a local economy dominated by eco-tourism and light industry, resulting in limited VOCs and NOx precursor emissions. In summer, more frequent rainfall enhances wet deposition and removes precursors, while strong vertical mixing in mountainous terrain promotes dispersion. Consequently, Chengde has the highest spring share among all cities in the region, but the lowest absolute O₃ concentration. The autumn-special-dominated type is characterized by a markedly higher autumn share than in other seasons, and within the Beijing–Tianjin–Hebei urban agglomeration there is only one such city: Qinhuangdao. As a coastal city, Qinhuangdao experiences strong marine humidity in summer, which suppresses photochemical reactions and leads to lower O₃ levels than in inland areas. In autumn, before frequent intrusions of cold air from the north, wind speeds decrease relative to summer, slowing O₃ dispersion. At the same time, autumn is the tourism off-season, so traffic NOx emissions decline, while industrial VOCs transported from Tangshan and Tianjin still provide a supply of precursors under mild photochemical conditions. As a result, autumn accounts for 44.4% of the annual O₃ contribution, making Qinhuangdao the only city where autumn is the dominant season. However, due to generally low photochemical efficiency throughout the year, its overall O₃ level ranks only 11th among the 13 cities.

Download:

Fig 16. Monthly contribution weights and temporal segmentation of O₃ in the Beijing–Tianjin–Hebei urban agglomeration.

https://doi.org/10.1371/journal.pone.0342192.g016

In summary, the ranking differentiation and seasonal contribution patterns of PM2.5 and O₃ concentrations in the Beijing–Tianjin–Hebei urban agglomeration exhibit pronounced spatial gradients and seasonal regularities. These findings provide a systematic basis for addressing the dual pollution patterns of PM2.5—characterized by “higher in the south, lower in the north, dominated by winter and autumn”—and O₃—characterized by “higher on the plains, lower in coastal areas, dominated by summer and spring”. To improve regional air quality in an integrated manner, priority should be given to establishing a joint prevention and control mechanism that operates across administrative boundaries and to advancing comprehensive management of pollution sources, thereby accelerating the formation of a low-carbon regional development model. Concretely, this requires the creation of a unified environmental standards system, the improvement of ecological compensation schemes, and the promotion of green industrial transformation, so as to achieve coordinated improvement and balanced development of environmental quality across the region.

6 Conclusion

The FP-OWA method proposed in this study offers a new perspective for ranking complex functional data. By integrating spline smoothing, depth-based analysis, and rank statistics, it effectively improves both the accuracy and stability of the ranking results. In terms of methodological performance, the simulation study systematically evaluates FP-OWA under five types of noise, three missing-data mechanisms, and outlier proportions ranging from 0% to 30%. Under low-noise conditions, FP-OWA clearly outperforms FPCA, which suffers from low rank concordance, the computationally intensive WLR method, and the depth-based h-MD and RTD methods, which exhibit systematic bias. Even in high-noise settings or under severe missingness, FP-OWA still maintains a noticeable advantage. Moreover, when combined with Huber or Hampel pre-filtering, the breakdown point with respect to outliers is delayed by about 10%–20%, providing strong evidence of the method’s stability in the presence of complex data perturbations. From an applied perspective, employing FP-OWA to rank the daily mean PM2.5 and O₃ concentrations in the Beijing–Tianjin–Hebei urban agglomeration allows us to accurately uncover the spatio-temporal heterogeneity of regional air pollution, thereby supplying robust technical support and data evidence for environmental governance and policy-making.

Although the FP-OWA method exhibits excellent computational efficiency at the sample sizes considered in the simulation study, the DBSCAN clustering and MBD depth modules embedded in the framework still face potential computational and theoretical challenges when dealing with ultra–large-scale or high-dimensional functional data. First, there is the scalability issue of MBD depth computation. Although MBD is simpler than traditional bandwidth-based depth measures, its core logic still relies on pairwise comparisons or higher-order combinations of sample curves, so the time complexity is essentially on the order of O(N²). As the sample size ngrows to massive scales, this quadratic increase in computational cost will lead to a substantial rise in runtime. In addition, when the sampling frequency of high-dimensional curves (dimension T) is very high, the cost of each integral comparison accumulates, which restricts the responsiveness of the algorithm in real-time or high-frequency streaming data applications. Second, there is the question of how suitable DBSCAN is in high-dimensional feature spaces. In this paper, DBSCAN is used for feature partitioning and relies on distance measures between samples (such as Euclidean distance) to define density. However, when facing high-dimensional functional data or inflated high-dimensional representations, the method inevitably encounters the curse of dimensionality. In high-dimensional space, data distributions become sparse and the differences between pairwise distances are blurred, making density-based cluster structures difficult to identify accurately. At the same time, in the absence of efficient spatial indexing structures, the neighborhood search process of DBSCAN also requires an O(N²)-scale distance matrix, which further constrains the applicability of FP-OWA in ultra–high-dimensional, complex data settings.

Given both the promise demonstrated by FP-OWA and the challenges it faces in handling ultra–large or high-dimensional data, future work can proceed along at least three directions. First, to address the quadratic growth in the cost of MBD depth computation, it would be useful to develop fast approximate depth algorithms that substantially reduce computational complexity while preserving ranking accuracy, and to design parallel MBD algorithms within a distributed computing framework to enable efficient processing of massive functional datasets. Second, to alleviate DBSCAN’s failure in high-dimensional feature spaces due to the curse of dimensionality, future research may explore manifold learning techniques as alternatives to traditional Euclidean distance, so as to capture more precisely the intrinsic nonlinear geometry of functional data in high-dimensional spaces. Third, since the current FP-OWA framework is mainly designed for univariate functional data, subsequent work will aim to extend it to multivariate functional settings by constructing multivariate feature aggregation operators that simultaneously capture cross-variable correlations and spatio-temporal dependencies, thereby further validating and enhancing the practical utility of the method.

Supporting information

S1 Appendix. Proof process of asymptotic properties.

https://doi.org/10.1371/journal.pone.0342192.s001

(PDF)

S2 Code. The complete simulation workflow involved in this study has been implemented in R (version 4.3.0).

The simulation source code (FP-OWA.R) has been deposited in a GitHub repository (https://github.com/Ly-sxmb/FP-OWA). The code structure comprises functional data generation, noise addition, implementation of five comparative methods, and parallelized performance evaluation, all of which correspond directly to the methodological procedures described in the manuscript.

https://doi.org/10.1371/journal.pone.0342192.s002

(R)

S3 Raw Data. The raw data used in the empirical study have been deposited in a GitHub repository (https://github.com/Ly-sxmb/FP-OWA).

https://doi.org/10.1371/journal.pone.0342192.s003

(CSV)

Acknowledgments

The completion of this research relies strictly on the generous support and insightful guidance of many, and we hereby express our distinct gratitude. We are particularly indebted to our corresponding author, Professor Maozai Tian, for his unwavering support. His patient and comprehensive guidance covered every stage of this work—from the initial conceptualization and research direction to the logical organization of the manuscript, the refinement of theoretical arguments, and the detailed responses to peer review comments. We also gratefully acknowledge the significant contributions of Dr. Xiaoxue Hu. Her deep involvement was instrumental during the research phase, particularly in optimizing the design of numerical experiments, compiling the pollutant concentration datasets, and revising the preliminary draft. Furthermore, we thank the reviewers for their constructive comments that greatly improved this paper. We also appreciate the efficiency and professionalism demonstrated by the editors and the editorial staff throughout the review and publication process. Finally, our sincere thanks go to all parties who have facilitated this research and its publication.

References

1. An Z, Huang R-J, Zhang R, Tie X, Li G, Cao J, et al. Severe haze in northern China: a synergy of anthropogenic emissions and atmospheric processes. Proc Natl Acad Sci USA. 2019;116(18):8657–66.
- View Article
- Google Scholar
2. Li G, Fang C, Wang S, Sun S. The effect of economic growth, urbanization, and industrialization on fine Particulate Matter (PM2.5) concentrations in China. Environ Sci Technol. 2016;50(21):11452–9. pmid:27709931
- View Article
- PubMed/NCBI
- Google Scholar
3. Wang Z, Li J, Liang L. Spatio-temporal evolution of ozone pollution and its influencing factors in the Beijing-Tianjin-Hebei Urban agglomeration. Environmental Pollution. 2020;256:113419.
- View Article
- Google Scholar
4. Sheng Q, Hong ZM, Chen NZ. Spatial distribution characteristics of PM2.5 and its influencing factors in Beijing-Tianjin-Hebei region. Environmental Protection Science. 2023, 49(05):68–75.
- View Article
- Google Scholar
5. Su MQ, Shi YS. Spatial and temporal distribution characteristics and influential factors of PM2.5 pollution in Beijing-Tianjin-Hebei. Journal of University of Chinese Academy of Sciences. 2024;41(03):334–44.
- View Article
- Google Scholar
6. Zhang Y, Ma Z, Gao Y, Zhang M. Impacts of the meteorological condition versus emissions reduction on the PM2.5 concentration over Beijing–Tianjin–Hebei during the COVID-19 lockdown. Atmospheric and Oceanic Science Letters. 2021;14(4):100014.
- View Article
- Google Scholar
7. Chang X, Wang S, Zhao B, Xing J, Liu X, Wei L, et al. Contributions of inter-city and regional transport to PM2.5 concentrations in the Beijing-Tianjin-Hebei region and its implications on regional joint air pollution control. Sci Total Environ. 2019;660:1191–200. pmid:30743914
- View Article
- PubMed/NCBI
- Google Scholar
8. Ramsay JO, Silverman BW. Functional data analysis. New York: Springer; 2005. https://doi.org/10.1007/b98888
9. Ramsay JO, Silverman BW. Applied functional data analysis: methods and case studies. New York: Springer; 2002. https://doi.org/10.1007/b98886
10. Nonparametric functional data analysis. New York: Springer; 2006. https://doi.org/10.1007/0-387-36620-2
11. Horváth, L, Piotr K. Inference for functional data with applications. Springer Science & Business Media, 2012. https://doi.org/10.1007/978-1-4614-1435-8
12. Cao RH. The Fourier transform and its applications. Pure Mathematics. 2014, 4(4): 138–43. http://dx.doi.org/10.12677/PM.2014.44021
- View Article
- Google Scholar
13. Chen HL, Hu XX. Functional principal component clustering algorithm under local linearity. Statistics and Decision. 2024, 40(05): 39–44.
- View Article
- Google Scholar
14. Bellman R, Kalaba R. On adaptive control processes. IRE Trans Automat Contr. 1959;4(2):1–9.
- View Article
- Google Scholar
15. Jing L, Ma WJ, Chang DH. Gesture acceleration signals recognition based on dynamic time warping. Chinese Journal of Sensors and Actuators. 2012, 25(1): 72–6.
- View Article
- Google Scholar
16. Eiter T, Mannila H. Computing discrete Fréchet distance. 1994. https://www.kr.tuwien.ac.at/staff/eiter/et-archive/files/cdtr9464.pdf
17. Wang X, Mueen A, Ding H, Trajcevski G, Scheuermann P, Keogh E. Experimental comparison of representation methods and distance measures for time series data. Data Min Knowl Disc. 2012;26(2):275–309.
- View Article
- Google Scholar
18. Tukey JW. Mathematics and the picturing of data. In: Proceedings of ICM. 1975, 2: 523–31. https://files.boazbarak.org/misc/mltheory/tukey_median.pdf
19. López-Pintado S, Romo J. On the concept of depth for functional data. Journal of the American Statistical Association. 2009;104(486):718–34.
- View Article
- Google Scholar
20. López-Pintado S, Romo J. Depth-based inference for functional data. Computational Statistics & Data Analysis. 2007;51(10):4957–68.
- View Article
- Google Scholar
21. Crainiceanu CM, Goldsmith AJ. Bayesian functional data analysis using WinBUGS. J Stat Soft. 2010;32(11).
- View Article
- Google Scholar
22. Cuevas A, Febrero M, Fraiman R. Robust estimation and classification for functional data via projection-based depth notions. Computational Statistics. 2007;22(3):481–96.
- View Article
- Google Scholar
23. Cuesta-Albertos JA, Nieto-Reyes A. The random Tukey depth. Computational Statistics & Data Analysis. 2008;52(11):4979–88.
- View Article
- Google Scholar
24. Sun Y, Genton MG. Functional boxplots. Journal of Computational and Graphical Statistics. 2011;20(2):316–34.
- View Article
- Google Scholar
25. King MC, Staicu A-M, Davis JM, Reich BJ, Eder B. A functional data analysis of spatiotemporal trends and variation in fine particulate matter. Atmos Environ 1994 . 2018;184:233–43. pmid:33716545
- View Article
- PubMed/NCBI
- Google Scholar
26. Elayouty A, Abou-Ali H. Functional data analysis of the relationship between electricity consumption and climate change drivers. J Appl Stat. 2022;50(10):2267–85. pmid:37434625
- View Article
- PubMed/NCBI
- Google Scholar
27. Notter DA. Life cycle impact assessment modeling for particulate matter: a new approach based on physico-chemical particle properties. Environ Int. 2015;82:10–20. pmid:26001495
- View Article
- PubMed/NCBI
- Google Scholar
28. Akopov AS, Beklaryan LA, Saghatelyan AK. Agent-based modelling for ecological economics: a case study of the Republic of Armenia. Ecological Modelling. 2017;346:99–118.
- View Article
- Google Scholar
29. Lin Z, Zhou Y. Ranking of functional data in application to worldwide PM10 10 data analysis. Environ Ecol Stat. 2017;24(4):469–84.
- View Article
- Google Scholar
30. Cuevas A, Febrero M, Fraiman R. Robust estimation and classification for functional data via projection-based depth notions. Computational Statistics. 2007;22(3):481–96.
- View Article
- Google Scholar
31. Sander J, Ester M, Kriegel H-P, Xu X. Density-based clustering in spatial databases: the algorithm GDBSCAN and its applications. Data Mining and Knowledge Discovery. 1998;2(2):169–94.
- View Article
- Google Scholar
32. Wang J-L, Chiou J-M, Müller H-G. Functional data analysis. Annu Rev Stat Appl. 2016;3(1):257–95.
- View Article
- Google Scholar
33. Han T, Peng Q, Zhu Z, Shen Y, Huang H, Abid NN. A pattern representation of stock time series based on DTW. Physica A: Statistical Mechanics and its Applications. 2020;550:124161.
- View Article
- Google Scholar
34. Sun Y, Genton MG, Nychka DW. Exact fast computation of band depth for large functional datasets: how quickly can one million curves be ranked?. Stat. 2012;1(1):68–74.
- View Article
- Google Scholar
35. Li Z, Guan S. Diffusion normalized Huber adaptive filtering algorithm. Journal of the Franklin Institute. 2018;355(8):3812–25.
- View Article
- Google Scholar
36. Roos-Hoefgeest Toribio M, Garnung Menéndez A, Roos-Hoefgeest Toribio S, Álvarez García I. A novel approach to speed up hampel filter for outlier detection. Sensors (Basel). 2025;25(11):3319. pmid:40968853
- View Article
- PubMed/NCBI
- Google Scholar

[ref1] 1. An Z, Huang R-J, Zhang R, Tie X, Li G, Cao J, et al. Severe haze in northern China: a synergy of anthropogenic emissions and atmospheric processes. Proc Natl Acad Sci USA. 2019;116(18):8657–66.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Li G, Fang C, Wang S, Sun S. The effect of economic growth, urbanization, and industrialization on fine Particulate Matter (PM2.5) concentrations in China. Environ Sci Technol. 2016;50(21):11452–9. pmid:27709931
View Article
PubMed/NCBI
Google Scholar

[5] View Article

[6] PubMed/NCBI

[7] Google Scholar

[ref3] 3. Wang Z, Li J, Liang L. Spatio-temporal evolution of ozone pollution and its influencing factors in the Beijing-Tianjin-Hebei Urban agglomeration. Environmental Pollution. 2020;256:113419.
View Article
Google Scholar

[9] View Article

[10] Google Scholar

[ref4] 4. Sheng Q, Hong ZM, Chen NZ. Spatial distribution characteristics of PM2.5 and its influencing factors in Beijing-Tianjin-Hebei region. Environmental Protection Science. 2023, 49(05):68–75.
View Article
Google Scholar

[12] View Article

[13] Google Scholar

[ref5] 5. Su MQ, Shi YS. Spatial and temporal distribution characteristics and influential factors of PM2.5 pollution in Beijing-Tianjin-Hebei. Journal of University of Chinese Academy of Sciences. 2024;41(03):334–44.
View Article
Google Scholar

[15] View Article

[16] Google Scholar

[ref6] 6. Zhang Y, Ma Z, Gao Y, Zhang M. Impacts of the meteorological condition versus emissions reduction on the PM2.5 concentration over Beijing–Tianjin–Hebei during the COVID-19 lockdown. Atmospheric and Oceanic Science Letters. 2021;14(4):100014.
View Article
Google Scholar

[18] View Article

[19] Google Scholar

[ref7] 7. Chang X, Wang S, Zhao B, Xing J, Liu X, Wei L, et al. Contributions of inter-city and regional transport to PM2.5 concentrations in the Beijing-Tianjin-Hebei region and its implications on regional joint air pollution control. Sci Total Environ. 2019;660:1191–200. pmid:30743914
View Article
PubMed/NCBI
Google Scholar

[21] View Article

[22] PubMed/NCBI

[23] Google Scholar

[ref8] 8. Ramsay JO, Silverman BW. Functional data analysis. New York: Springer; 2005. https://doi.org/10.1007/b98888

[ref9] 9. Ramsay JO, Silverman BW. Applied functional data analysis: methods and case studies. New York: Springer; 2002. https://doi.org/10.1007/b98886

[ref10] 10. Nonparametric functional data analysis. New York: Springer; 2006. https://doi.org/10.1007/0-387-36620-2

[ref11] 11. Horváth, L, Piotr K. Inference for functional data with applications. Springer Science & Business Media, 2012. https://doi.org/10.1007/978-1-4614-1435-8

[ref12] 12. Cao RH. The Fourier transform and its applications. Pure Mathematics. 2014, 4(4): 138–43. http://dx.doi.org/10.12677/PM.2014.44021
View Article
Google Scholar

[29] View Article

[30] Google Scholar

[ref13] 13. Chen HL, Hu XX. Functional principal component clustering algorithm under local linearity. Statistics and Decision. 2024, 40(05): 39–44.
View Article
Google Scholar

[32] View Article

[33] Google Scholar

[ref14] 14. Bellman R, Kalaba R. On adaptive control processes. IRE Trans Automat Contr. 1959;4(2):1–9.
View Article
Google Scholar

[35] View Article

[36] Google Scholar

[ref15] 15. Jing L, Ma WJ, Chang DH. Gesture acceleration signals recognition based on dynamic time warping. Chinese Journal of Sensors and Actuators. 2012, 25(1): 72–6.
View Article
Google Scholar

[38] View Article

[39] Google Scholar

[ref16] 16. Eiter T, Mannila H. Computing discrete Fréchet distance. 1994. https://www.kr.tuwien.ac.at/staff/eiter/et-archive/files/cdtr9464.pdf

[ref17] 17. Wang X, Mueen A, Ding H, Trajcevski G, Scheuermann P, Keogh E. Experimental comparison of representation methods and distance measures for time series data. Data Min Knowl Disc. 2012;26(2):275–309.
View Article
Google Scholar

[42] View Article

[43] Google Scholar

[ref18] 18. Tukey JW. Mathematics and the picturing of data. In: Proceedings of ICM. 1975, 2: 523–31. https://files.boazbarak.org/misc/mltheory/tukey_median.pdf

[ref19] 19. López-Pintado S, Romo J. On the concept of depth for functional data. Journal of the American Statistical Association. 2009;104(486):718–34.
View Article
Google Scholar

[46] View Article

[47] Google Scholar

[ref20] 20. López-Pintado S, Romo J. Depth-based inference for functional data. Computational Statistics & Data Analysis. 2007;51(10):4957–68.
View Article
Google Scholar

[49] View Article

[50] Google Scholar

[ref21] 21. Crainiceanu CM, Goldsmith AJ. Bayesian functional data analysis using WinBUGS. J Stat Soft. 2010;32(11).
View Article
Google Scholar

[52] View Article

[53] Google Scholar

[ref22] 22. Cuevas A, Febrero M, Fraiman R. Robust estimation and classification for functional data via projection-based depth notions. Computational Statistics. 2007;22(3):481–96.
View Article
Google Scholar

[55] View Article

[56] Google Scholar

[ref23] 23. Cuesta-Albertos JA, Nieto-Reyes A. The random Tukey depth. Computational Statistics & Data Analysis. 2008;52(11):4979–88.
View Article
Google Scholar

[58] View Article

[59] Google Scholar

[ref24] 24. Sun Y, Genton MG. Functional boxplots. Journal of Computational and Graphical Statistics. 2011;20(2):316–34.
View Article
Google Scholar

[61] View Article

[62] Google Scholar

[ref25] 25. King MC, Staicu A-M, Davis JM, Reich BJ, Eder B. A functional data analysis of spatiotemporal trends and variation in fine particulate matter. Atmos Environ 1994 . 2018;184:233–43. pmid:33716545
View Article
PubMed/NCBI
Google Scholar

[64] View Article

[65] PubMed/NCBI

[66] Google Scholar

[ref26] 26. Elayouty A, Abou-Ali H. Functional data analysis of the relationship between electricity consumption and climate change drivers. J Appl Stat. 2022;50(10):2267–85. pmid:37434625
View Article
PubMed/NCBI
Google Scholar

[68] View Article

[69] PubMed/NCBI

[70] Google Scholar

[ref27] 27. Notter DA. Life cycle impact assessment modeling for particulate matter: a new approach based on physico-chemical particle properties. Environ Int. 2015;82:10–20. pmid:26001495
View Article
PubMed/NCBI
Google Scholar

[72] View Article

[73] PubMed/NCBI

[74] Google Scholar

[ref28] 28. Akopov AS, Beklaryan LA, Saghatelyan AK. Agent-based modelling for ecological economics: a case study of the Republic of Armenia. Ecological Modelling. 2017;346:99–118.
View Article
Google Scholar

[76] View Article

[77] Google Scholar

[ref29] 29. Lin Z, Zhou Y. Ranking of functional data in application to worldwide PM10 10 data analysis. Environ Ecol Stat. 2017;24(4):469–84.
View Article
Google Scholar

[79] View Article

[80] Google Scholar

[ref30] 30. Cuevas A, Febrero M, Fraiman R. Robust estimation and classification for functional data via projection-based depth notions. Computational Statistics. 2007;22(3):481–96.
View Article
Google Scholar

[82] View Article

[83] Google Scholar

[ref31] 31. Sander J, Ester M, Kriegel H-P, Xu X. Density-based clustering in spatial databases: the algorithm GDBSCAN and its applications. Data Mining and Knowledge Discovery. 1998;2(2):169–94.
View Article
Google Scholar

[85] View Article

[86] Google Scholar

[ref32] 32. Wang J-L, Chiou J-M, Müller H-G. Functional data analysis. Annu Rev Stat Appl. 2016;3(1):257–95.
View Article
Google Scholar

[88] View Article

[89] Google Scholar

[ref33] 33. Han T, Peng Q, Zhu Z, Shen Y, Huang H, Abid NN. A pattern representation of stock time series based on DTW. Physica A: Statistical Mechanics and its Applications. 2020;550:124161.
View Article
Google Scholar

[91] View Article

[92] Google Scholar

[ref34] 34. Sun Y, Genton MG, Nychka DW. Exact fast computation of band depth for large functional datasets: how quickly can one million curves be ranked?. Stat. 2012;1(1):68–74.
View Article
Google Scholar

[94] View Article

[95] Google Scholar

[ref35] 35. Li Z, Guan S. Diffusion normalized Huber adaptive filtering algorithm. Journal of the Franklin Institute. 2018;355(8):3812–25.
View Article
Google Scholar

[97] View Article

[98] Google Scholar

[ref36] 36. Roos-Hoefgeest Toribio M, Garnung Menéndez A, Roos-Hoefgeest Toribio S, Álvarez García I. A novel approach to speed up hampel filter for outlier detection. Sensors (Basel). 2025;25(11):3319. pmid:40968853
View Article
PubMed/NCBI
Google Scholar

[100] View Article

[101] PubMed/NCBI

[102] Google Scholar

Figures

Abstract

1 Introduction

2 Introduction to ranking methods

2.1 Existing functional ranking methods

2.2 Adaptive Functional Piecewise Ordered Weighted Averaging Method (FP-OWA)

2.3 Time complexity of ranking methods

3 Asymptotic property

4 Simulation study

4.1 Generation of simulated data

4.2 Evaluation criteria

4.3 Simulation results and analysis

4.4 Sensitivity analysis

4.5 Performance of the FP-OWA method under missing data

4.6 Ranking stability analysis

4.7 Prediction

5 Empirical data analysis

5.1 Sources of data

5.2 Application of the FP-OWA method

6 Conclusion

Supporting information

S1 Appendix. Proof process of asymptotic properties.

S2 Code. The complete simulation workflow involved in this study has been implemented in R (version 4.3.0).

S3 Raw Data. The raw data used in the empirical study have been deposited in a GitHub repository (https://github.com/Ly-sxmb/FP-OWA).

Acknowledgments

References