Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

A self-normalization and support vector regression based approach for detecting structural change points in time series

  • Nini Xie

    Roles Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing

    1908542432@qq.com

    Affiliations Xingzhi College of Xi’an University of Finance and Economics, Xi’an, Shaanxi, China, School of Mathematics and Statistics, Qinghai Normal University, Xining, Qinghai, China

Abstract

Background

The detection of structural change points in time series is a fundamental problem in statistical analysis, with significant implications across numerous scientific disciplines. Traditional change-point detection methods often face challenges in consistently estimating the long-run variance of time series, which can limit their practical application.

Methodology/Principal findings

This paper introduces a novel change-point detection methodology that integrates Support Vector Regression (SVR) with a self-normalization framework. By leveraging SVR's flexible modeling capabilities to obtain accurate residual estimates and employing a self-normalized test statistic, our approach circumvents the need for long-run variance estimation. Under the null hypothesis of no structural change, the test statistic converges to a non-degenerate limiting distribution, while under the alternative hypothesis, it diverges to infinity, ensuring consistent detection power. Extensive simulation studies demonstrate that our method outperforms existing SVR-based tests in finite-sample performance, offering improved size control (empirical size close to nominal 0.05 level) and higher detection power across various scenarios. Empirical applications to hydrological and financial time series (Nile River flow data and Nikkei 225 index) validate the method's practical utility in real-world settings.

Conclusions/Significance

The proposed framework provides a robust, parameter-free tool for analyzing structural instability in time series, with particular advantages in handling complex, nonlinear data structures. The method’s avoidance of tuning parameters and consistent performance across different domains suggest broad applicability in scientific research and practical applications.

Introduction

The identification of structural change points in time series data represents a fundamental challenge in statistical analysis with far-reaching implications across scientific disciplines. These change points—moments when the underlying data-generating process undergoes significant alteration—provide critical insights into system dynamics, regime shifts, and anomalous events. In epidemiological monitoring, change-point detection can signal shifts in disease transmission rates, enabling timely public health interventions [1]. Financial analysts rely on structural break detection to identify market regime changes and economic turning points [2], while environmental scientists use these methods to detect climate pattern shifts and extreme weather events [3].

Despite considerable methodological advances since Page’s pioneering work [4], traditional change-point detection methods face persistent challenges. Most conventional approaches require consistent estimation of long-run variance to derive valid asymptotic distributions for test statistics [5,6]. This estimation typically involves selecting bandwidth parameters that substantially influence test performance [7], creating a source of potential instability in practical applications. The emergence of complex, high-dimensional datasets in the big data era has further exacerbated these challenges, as modern time series often exhibit nonlinear patterns, heterogeneous variance, and multiple change-point types that traditional methods struggle to detect reliably [8].

Recent methodological innovations have sought to address these limitations through machine learning approaches. Support Vector Regression (SVR) has emerged as a particularly promising technique due to its flexibility in capturing complex, nonlinear relationships with minimal tuning requirements [9]. Simultaneously, self-normalization methods have gained attention for their ability to provide robust statistical inference without requiring explicit long-run variance estimation [10]. However, the integration of these two powerful approaches for change-point detection remains largely unexplored.

This paper bridges this methodological gap by developing a unified framework that combines SVR’s modeling flexibility with the statistical robustness of self-normalization. Our approach specifically targets the ARMA model class, which provides a flexible yet parsimonious framework for modeling dependent data while maintaining interpretability. The proposed method offers several advantages over existing approaches: it eliminates the need for bandwidth selection and other tuning parameters that often complicate traditional methods; it leverages SVR’s capability to capture complex data patterns; and it provides a solid theoretical foundation for statistical inference.

We establish three primary contributions. First, we develop a novel change-point detection methodology that integrates SVR with self-normalization in a theoretically grounded framework. Second, we derive the asymptotic properties of the proposed test statistic, proving its convergence under the null hypothesis and consistency under alternatives. Third, we demonstrate through extensive simulations and real-world applications that our method outperforms existing approaches while maintaining practical applicability across diverse domains.

Materials and methods

Model specification and hypotheses

We consider the stationary ARMA(p, q) model as our baseline framework. Let represent the observed time series, which follows the general form:

where and are real-valued parameters, and represents a sequence of independent and identically distributed random variables with mean zero and variance .

Let denote the complete parameter vector. The change-point detection problem is formalized through the following hypothesis framework:

  • Null hypothesis (): No structural change exists, implying parameter constancy throughout the observation period: for .
  • Alternative hypothesis(): A single change point exists at location k, producing a structural break in the parameter vector:

for ,

for ,

where represents a fixed vector of parameter changes.

Residual estimation via SVR-ARMA

The self-normalization approach requires accurate residual estimates, which we obtain through an SVR-ARMA framework. The residuals are computed recursively as:

with initial conditions for all . The parameter estimates and are obtained via Gaussian quasi-maximum likelihood estimation [11], ensuring consistency under standard regularity conditions. Specifically, we use the statsmodels library (version 0.14.0) in Python to fit the ARMA(p, q) model via the ARMA.fit() method with method = ‘mle’, which implements Gaussian quasi-maximum likelihood estimation. The integration of SVR enhances the framework’s capacity to capture potential nonlinearities and complex dependencies that may be present in the time series.

Self-normalization test statistic

We construct the self-normalization test statistic using a cumulative sum approach based on the estimated residuals. For the partial sum process, we define the segment means and cumulative sums as follows.

For and :

For each potential change point location , we define the self-normalized statistic:

The final test statistic represents the supremum over a trimmed search region:

where are trimming parameters that exclude the boundaries of the sample. We employ and throughout our analysis to ensure sufficient observations in each segment for reliable estimation while maintaining reasonable computational efficiency.

Algorithm implementation

We formalize the complete change-point detection procedure in Algorithm 1:

Algorithm 1: SVR-SN Change-Point Detection

Input: Time series , trimming parameters τ₁, τ₂.

Output: Test statistic , change-point estimate .

(1) Fit the SVR-ARMA model to the time series using Gaussian quasi-maximum likelihood estimation to obtain parameter estimates and , then compute the residuals using Eq (3).

(2) for to do.

(3) Compute and .

(4) Calculate partial sum processes and .

(5) Compute using equation above.

(6) end for.

(7) .

(8) .

(9) return .

The Python implementation of the proposed SVR-SN algorithm, including code for simulations and empirical applications, is available in S1 Code.

Comparative methods

For performance benchmarking, we compare our method against two established SVR-based tests proposed by Lee et al. [12]. The first comparative method employs a likelihood-score based approach:

The second employs a maximum-type statistic:

where the variance estimators are defined as:

We reject H₀ at significance level α = 0.05 when , where cα represents the critical value obtained via Monte Carlo simulation.

Theoretical results

Theoretical properties

We first establish the theoretical foundation of our approach through two key theorems.

Assumption 1. The time series is strictly stationary and ergodic with for some . The parameter estimators and are -consistent, and the estimated residuals satisfy .

Theorem 1. Under and Assumption 1, as ,

where , , and are independent Brownian bridges on .

In the proof below, and will arise as the limiting processes of the partial sum processes before and after the candidate change point , respectively. Their independence follows from the independence of the increments of the original Brownian motion over disjoint intervals.

Proof of Theorem 1. Under , the estimated residuals are approximately i.i.d. with mean zero and finite variance . By the functional central limit theorem (Donsker’s theorem), the partial sum process satisfies:

where is a standard Brownian motion. The centered partial sum process converges to a Brownian bridge:

Now consider the numerator of . Let , then:

Note that:

Therefore,

As , , , and by the continuous mapping theorem:

For the denominator, consider the two components separately. For :

By the functional central limit theorem:

Here is a Brownian bridge on .] Similarly, for :

Here is another Brownian bridge independent of .

By the continuous mapping theorem:

Therefore, the denominator converges to:

Combining numerator and denominator:

Taking the supremum over gives the desired result.

Theorem 2. Under and Assumption 1, for any fixed change-point with , as ,

Proof of Theorem 2. Under , there exists a change-point at where the parameter vector changes from to . This induces a mean shift in the residuals:

where is a constant determined by the parameter change .

For the numerator:

By the law of large numbers:

Therefore,

For the denominator, it remains bounded in probability:

This is because the denominator is a self-normalizer based on centered residuals, which converges to a finite integral of Brownian bridge processes.

Hence,

This completes the proof of consistency under the alternative hypothesis.

The proofs establish that our test statistic converges to a well-defined limiting distribution under the null hypothesis while diverging under alternatives, ensuring consistent detection power. Complete proofs are provided in S1 Text.

Results

Simulation studies

We conducted comprehensive Monte Carlo simulations to evaluate the finite-sample performance of our proposed method. Data were generated from AR(1) and ARMA(1,1) processes with sample sizes and 500, representing moderate and large sample scenarios commonly encountered in practice. All results are based on 1,000 replications at the 0.05 significance level.

In the following tables, “Size” refers to the empirical size (type I error rate) under the null hypothesis of no change, while “Empirical potential” refers to the empirical power (detection rate) under the alternative hypothesis with a single change point. The parameters , , , and denote the autoregressive coefficient, moving average coefficient, error variance, and mean shift magnitude, respectively.

Table 1 shows the empirical sizes and empirical power of the AR(1) model in different cases. The test statistic can well control the empirical size for different parameter values, and almost no empirical size distortion occurs. For example, at and , the empirical power of the SVR-based self-normalization test statistic is 0.637, while the empirical powers of and are 0.523 and 0.484, respectively. From Table 1, the self-normalization test has better empirical powers than and in most cases. This result proves the effectiveness of our proposed SVR-based self-normalization test in approximating the critical values of the statistics.

thumbnail
Table 1. Empirical sizes and powers for AR(1) model with and .

https://doi.org/10.1371/journal.pone.0340729.t001

Table 2 shows the empirical size and power of the ARMA(1,1) model in different cases. The test statistic usually well controls the empirical size for different parameters, which proves the effectiveness of the proposed self-normalization test method based on SVR in approximating the critical value of the statistic. For example, at , , , and , the empirical power of the SVR-based self-normalization test statistic is 0.636, while the empirical powers of and are 0.620 and 0.472, respectively. When the sample size increases the empirical powers of the statistics become better and closer to 1, which proves that the proposed self-normalization test method based on SVR outperforms and in most cases.

thumbnail
Table 2. Empirical sizes and powers for the ARMA(1, 1) model with , , and.

https://doi.org/10.1371/journal.pone.0340729.t002

Table 3 summarizes the empirical size and empirical powers of the ARMA(1,1) model for different cases. The test statistic usually well controls the empirical size for different parameter values, which proves the effectiveness of the proposed self-normalization test method based on SVR in approximating the critical value of the statistic. For example, when , , , and , the empirical power of the SVR-based self-normalization test statistic is 0.884, while the empirical powers of and are 0.838 and 0.775, respectively. Thus, the proposed SVR-based self-normalization test method outperforms and in most cases and is effective.

thumbnail
Table 3. Empirical sizes and powers for the ARMA(1,1) model with ϕ = 0.3, θ = 0.3, σ = 1, and μ = 2.

https://doi.org/10.1371/journal.pone.0340729.t003

We note that in scenarios where only the error variance changes (e.g., in Tables 1–3), the proposed method occasionally shows slightly lower power compared to changes in mean-related parameters (). This is consistent with the nature of self-normalization, which standardizes by a scale estimate and may be less sensitive to pure variance shifts compared to mean-structure changes. This characteristic is known in the self-normalization literature and does not detract from the method’s primary strength in detecting mean-related structural breaks.

Tables 4 and 5 summarize the empirical sizes and empirical powers of the SVR-based self-normalization test statistic for different scenarios with different parameter values at different change point locations. The changes in parameters, sample size, and location of the change points significantly impact the empirical powers of the statistic. The SVR-based self-normalization test statistic has better empirical powers than and under different conditions. The empirical power increases with the sample size, e.g., at , , , , and , the empirical powers of the SVR-based self-normalization test statistic are 0.719, 0.985, and 0.927, while the empirical powers of are 0.689, 0.857, and 0.925, and those of are 0.691, 0.777, and 0.870, respectively. Thus, the test powers of this chapter are higher under the alternative hypothesis in most cases. The empirical powers of the SVR-based self-normalization test are higher in the middle position than at the two end positions, which indicates that it will be easier to detect when a change point appears in the middle of the sample. This result confirms our assumption that the change point relatively occurs in the middle position.

thumbnail
Table 4. Empirical potential of the ARMA (1, 1) model with n = 200, ϕ = 0.3, θ = 0.3, σ = 1, and μ = 2 at different change point locations under the alternatives, including the null hypothesis.

https://doi.org/10.1371/journal.pone.0340729.t004

thumbnail
Table 5. Empirical potential of the ARMA (1, 1) model with n = 500, ϕ = 0.3, θ = 0.3, σ = 1, and μ = 2 at different change point locations under the alternatives, including the null hypothesis.

https://doi.org/10.1371/journal.pone.0340729.t005

Empirical illustration

We applied our method to two real-world datasets to validate its practical utility.

Analysis of the annual Nile River flow data (1871–1970) yielded a test statistic value of 315.52, substantially exceeding the critical value of 32.81 (). The detected change point corresponds to known historical patterns in Nile River flow regimes, potentially reflecting climate variations or human interventions during the measurement period. The close correspondence between our results and established findings in hydrological literature [13] validates the method’s applicability to environmental time series (Fig 1).

thumbnail
Fig 1. Annual volume of discharge from the Nile River at Aswan from 1871 to 1970.

https://doi.org/10.1371/journal.pone.0340729.g001

The vertical dashed line indicates the detected change point at year 1898. The horizontal axis is labeled “Year” and the vertical axis is labeled “Discharge (m³/s)”.

Application to the Nikkei 225 index (2000–2021) produced a test statistic value of 70.25, again significantly exceeding the critical value (). The detected change points align with major financial events including the 2008 global financial crisis and COVID-19 market disruption. Comparison with established financial econometrics methods [14] showed strong agreement in change point identification, while our method offered the practical advantage of avoiding complex volatility modeling and parameter tuning (Fig 2).

thumbnail
Fig 2. Monthly log return data of 100 times Nikkei 225 index from Jan. 1, 2000, to Dec. 1, 2021.

https://doi.org/10.1371/journal.pone.0340729.g002

The vertical dashed lines indicate the detected change points in 2008 and 2020. The horizontal axis is labeled “Year” and the vertical axis is labeled “Log Return × 100”.

Discussion

This study has introduced a novel change-point detection methodology that integrates Support Vector Regression with a self-normalization framework. Through theoretical analysis, comprehensive simulations, and empirical applications, we have demonstrated that our approach effectively addresses key limitations of traditional change-point detection methods while maintaining robust performance across diverse scenarios.

The primary methodological contribution of our work is the development of a unified framework that eliminates the need for long-run variance estimation through self-normalization while leveraging SVR’s flexibility to capture complex data patterns. This combination provides several practical advantages: it avoids bandwidth selection and other tuning parameters that often complicate traditional methods; it enhances capability to detect changes in complex, potentially nonlinear time series; and it provides a solid theoretical foundation for statistical inference.

Our simulation results demonstrate that the proposed method outperforms existing SVR-based alternatives in terms of both size control and detection power. The empirical applications to hydrological and financial data further validate the method’s practical utility across different domains. For researchers and practitioners working with time series data, our method offers a robust, computationally efficient tool for structural break detection that requires minimal manual intervention.

The current framework is designed for single change-point detection. Extending it to multiple change points presents both methodological and computational challenges, including increased search complexity and potential interference between adjacent breaks. Future work could explore sequential detection strategies, such as binary segmentation or wild binary segmentation, integrated with the SVR-self-normalization framework. Additionally, the method’s sensitivity to the trimming parameters and was examined through supplementary simulations with alternative values (e.g., ), which showed stable performance, confirming its robustness to reasonable parameter choices (see S1 Table for detailed results).

While the method demonstrates strong performance in univariate settings, many real-world applications involve multivariate time series. Developing a multivariate extension that can handle cross-dependent structures represents a valuable direction for future research. Integration with other machine learning techniques, such as deep learning or ensemble methods, could further enhance detection capability in high-dimensional or nonstationary environments.

In conclusion, the SVR-self-normalization approach provides a robust, theoretically sound, and practically useful framework for detecting structural change points in time series. The method’s strong performance across diverse scenarios, combined with its practical advantages in implementation, suggests broad applicability across scientific disciplines where reliable change-point detection is critical.

Supporting information

S1 Text. Proofs of Theorems 1 and 2.

Complete derivations and proofs for the theoretical results presented in Section 2.6 (Theoretical results) of the main text, establishing the asymptotic distribution of the test statistic under the null hypothesis and its consistency under the alternative.

https://doi.org/10.1371/journal.pone.0340729.s001

(DOCX)

S1 Table. Empirical size and power of the SVR-SN test under alternative trimming parameters

. Results of sensitivity analysis examining the robustness of the proposed method to different trimming parameter choices. Empirical sizes and powers are reported for ARMA(1,1) models with n = 200 and 500, based on 1,000 Monte Carlo replications at the 0.05 significance level.

https://doi.org/10.1371/journal.pone.0340729.s002

(DOCX)

S1 Code. Python implementation of the SVR-SN algorithm.

Python scripts implementing the proposed SVR-SN change-point detection algorithm, including functions for data simulation, model fitting, test statistic calculation, Monte Carlo simulations (reproducing Tables 1–5), and applications to real-world datasets (Nile River and Nikkei 225).

https://doi.org/10.1371/journal.pone.0340729.s003

(DOCX)

Acknowledgments

The authors sincerely thank Professor Zhanshou Chen for his invaluable guidance and support throughout this research. We also appreciate the Editor, anonymous reviewers, and editorial staff of PLOS ONE for their time, valuable comments, and professional assistance during the review process.

References

  1. 1. Jiang F, Zhao Z, Shao X. Time series analysis of COVID-19 infection curve: A change-point perspective. J Econom. 2023;232(1):1–17. pmid:32836681
  2. 2. Ye W, Miao B, Ma Y. Contagion analysis of the U.S. subprime debt crisis based on variation point detection of hazard rate function. Systems Engineering—Theory & Practice. 2010;30(3).
  3. 3. Huang X, Tian Z, Qin R, Wang X, Zheng C. Real-time variation point detection of meteorological data based on hybrid model adaptive LASSO method. Journal of Guangxi University for Nationalities (Natural Sciences Edition). 2021;27(2).
  4. 4. Page ES. Continuous Inspection Schemes. Biometrika. 1954;41(1/2):100.
  5. 5. Lee S, Kim CK, Lee S. Hybrid CUSUM Change Point Test for Time Series with Time-Varying Volatilities Based on Support Vector Regression. Entropy (Basel). 2020;22(5):578. pmid:33286350
  6. 6. Csörgő M, Horváth L. Limit theorems in change-point analysis. John Wiley & Sons; 1997.
  7. 7. Inclán C, Tiao GC. Use of Cumulative Sums of Squares for Retrospective Detection of Changes of Variance. Journal of the American Statistical Association. 1994;89(427):913–23.
  8. 8. Truong C, Oudre L, Vayatis N. Selective review of offline change point detection methods. Signal Processing. 2020;167:107299.
  9. 9. Smola AJ, Schölkopf B. A tutorial on support vector regression. Statistics and Computing. 2004;14(3):199–222.
  10. 10. Shao X. A simple test of changes in mean in the possible presence of long-range dependence. Journal of Time Series Analysis. 2011;32(6):598–606.
  11. 11. Francq C, Zakoïan J-M. Maximum likelihood estimation of pure GARCH and ARMA-GARCH processes. Bernoulli. 2004;10(4).
  12. 12. Lee S, Lee S, Moon M. Hybrid change point detection for time series via support vector regression and CUSUM method. Applied Soft Computing. 2020;89:106101.
  13. 13. Cobb GW. The problem of the Nile: Conditional solution to a changepoint problem. Biometrika. 1978;65(2):243–51.
  14. 14. Bai J, Perron P. Estimating and Testing Linear Models with Multiple Structural Changes. Econometrica. 1998;66(1):47.