Figures
Abstract
Background
The detection of structural change points in time series is a fundamental problem in statistical analysis, with significant implications across numerous scientific disciplines. Traditional change-point detection methods often face challenges in consistently estimating the long-run variance of time series, which can limit their practical application.
Methodology/Principal findings
This paper introduces a novel change-point detection methodology that integrates Support Vector Regression (SVR) with a self-normalization framework. By leveraging SVR's flexible modeling capabilities to obtain accurate residual estimates and employing a self-normalized test statistic, our approach circumvents the need for long-run variance estimation. Under the null hypothesis of no structural change, the test statistic converges to a non-degenerate limiting distribution, while under the alternative hypothesis, it diverges to infinity, ensuring consistent detection power. Extensive simulation studies demonstrate that our method outperforms existing SVR-based tests in finite-sample performance, offering improved size control (empirical size close to nominal 0.05 level) and higher detection power across various scenarios. Empirical applications to hydrological and financial time series (Nile River flow data and Nikkei 225 index) validate the method's practical utility in real-world settings.
Conclusions/Significance
The proposed framework provides a robust, parameter-free tool for analyzing structural instability in time series, with particular advantages in handling complex, nonlinear data structures. The method’s avoidance of tuning parameters and consistent performance across different domains suggest broad applicability in scientific research and practical applications.
Citation: Xie N (2026) A self-normalization and support vector regression based approach for detecting structural change points in time series. PLoS One 21(4): e0340729. https://doi.org/10.1371/journal.pone.0340729
Editor: Alessandro Mazzoccoli, Roma Tre University: Universita degli Studi Roma Tre, ITALY
Received: November 6, 2025; Accepted: December 24, 2025; Published: April 7, 2026
Copyright: © 2026 Nini Xie. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All data used in this study are from publicly available sources, and the complete workflow to download, process, and analyze the data is provided to ensure full reproducibility. Nile River Data: The annual flow data can be accessed directly via the datasets package in R (command: data(“Nile”)) or from its primary source cited in [13]. Nikkei 225 Data: The underlying index data can be downloaded from Yahoo Finance (https://finance.yahoo.com). The Python script in S1 Code includes the exact query used to retrieve the data for the period 2000–2021. Simulation Data: All Monte Carlo simulation datasets are generated by the code provided in S1 Code. No pre-generated simulation data files are required. The complete analysis pipeline, including data acquisition, preprocessing, simulation, and statistical testing, is documented in the S1 Code (Supporting Information). The minimal anonymized dataset necessary to replicate the study findings has been uploaded to the Zenodo public repository: Repository: Zenodo DOI: 10.5281/zenodo.18091929.
Funding: This work was supported by the following: - The National Natural Science Foundation of China (Grant No. 12161072) to Z.C. - The Natural Science Foundation of Qinghai Province (Grant No. 2024-ZJ-933) to Z.C. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Introduction
The identification of structural change points in time series data represents a fundamental challenge in statistical analysis with far-reaching implications across scientific disciplines. These change points—moments when the underlying data-generating process undergoes significant alteration—provide critical insights into system dynamics, regime shifts, and anomalous events. In epidemiological monitoring, change-point detection can signal shifts in disease transmission rates, enabling timely public health interventions [1]. Financial analysts rely on structural break detection to identify market regime changes and economic turning points [2], while environmental scientists use these methods to detect climate pattern shifts and extreme weather events [3].
Despite considerable methodological advances since Page’s pioneering work [4], traditional change-point detection methods face persistent challenges. Most conventional approaches require consistent estimation of long-run variance to derive valid asymptotic distributions for test statistics [5,6]. This estimation typically involves selecting bandwidth parameters that substantially influence test performance [7], creating a source of potential instability in practical applications. The emergence of complex, high-dimensional datasets in the big data era has further exacerbated these challenges, as modern time series often exhibit nonlinear patterns, heterogeneous variance, and multiple change-point types that traditional methods struggle to detect reliably [8].
Recent methodological innovations have sought to address these limitations through machine learning approaches. Support Vector Regression (SVR) has emerged as a particularly promising technique due to its flexibility in capturing complex, nonlinear relationships with minimal tuning requirements [9]. Simultaneously, self-normalization methods have gained attention for their ability to provide robust statistical inference without requiring explicit long-run variance estimation [10]. However, the integration of these two powerful approaches for change-point detection remains largely unexplored.
This paper bridges this methodological gap by developing a unified framework that combines SVR’s modeling flexibility with the statistical robustness of self-normalization. Our approach specifically targets the ARMA model class, which provides a flexible yet parsimonious framework for modeling dependent data while maintaining interpretability. The proposed method offers several advantages over existing approaches: it eliminates the need for bandwidth selection and other tuning parameters that often complicate traditional methods; it leverages SVR’s capability to capture complex data patterns; and it provides a solid theoretical foundation for statistical inference.
We establish three primary contributions. First, we develop a novel change-point detection methodology that integrates SVR with self-normalization in a theoretically grounded framework. Second, we derive the asymptotic properties of the proposed test statistic, proving its convergence under the null hypothesis and consistency under alternatives. Third, we demonstrate through extensive simulations and real-world applications that our method outperforms existing approaches while maintaining practical applicability across diverse domains.
Materials and methods
Model specification and hypotheses
We consider the stationary ARMA(p, q) model as our baseline framework. Let represent the observed time series, which follows the general form:
where and
are real-valued parameters, and
represents a sequence of independent and identically distributed random variables with mean zero and variance
.
Let denote the complete parameter vector. The change-point detection problem is formalized through the following hypothesis framework:
- Null hypothesis (
): No structural change exists, implying parameter constancy throughout the observation period:
for
.
- Alternative hypothesis(
): A single change point exists at location k, producing a structural break in the parameter vector:
for
,
for
,
where represents a fixed vector of parameter changes.
Residual estimation via SVR-ARMA
The self-normalization approach requires accurate residual estimates, which we obtain through an SVR-ARMA framework. The residuals are computed recursively as:
with initial conditions for all
. The parameter estimates
and
are obtained via Gaussian quasi-maximum likelihood estimation [11], ensuring consistency under standard regularity conditions. Specifically, we use the statsmodels library (version 0.14.0) in Python to fit the ARMA(p, q) model via the ARMA.fit() method with method = ‘mle’, which implements Gaussian quasi-maximum likelihood estimation. The integration of SVR enhances the framework’s capacity to capture potential nonlinearities and complex dependencies that may be present in the time series.
Self-normalization test statistic
We construct the self-normalization test statistic using a cumulative sum approach based on the estimated residuals. For the partial sum process, we define the segment means and cumulative sums as follows.
For and
:
For each potential change point location , we define the self-normalized statistic:
The final test statistic represents the supremum over a trimmed search region:
where are trimming parameters that exclude the boundaries of the sample. We employ
and
throughout our analysis to ensure sufficient observations in each segment for reliable estimation while maintaining reasonable computational efficiency.
Algorithm implementation
We formalize the complete change-point detection procedure in Algorithm 1:
Algorithm 1: SVR-SN Change-Point Detection
Input: Time series , trimming parameters τ₁, τ₂.
Output: Test statistic , change-point estimate
.
(1) Fit the SVR-ARMA model to the time series using Gaussian quasi-maximum likelihood estimation to obtain parameter estimates
and
, then compute the residuals
using Eq (3).
(2) for to
do.
(3) Compute and
.
(4) Calculate partial sum processes and
.
(5) Compute using equation above.
(6) end for.
(7) .
(8) .
(9) return .
The Python implementation of the proposed SVR-SN algorithm, including code for simulations and empirical applications, is available in S1 Code.
Comparative methods
For performance benchmarking, we compare our method against two established SVR-based tests proposed by Lee et al. [12]. The first comparative method employs a likelihood-score based approach:
The second employs a maximum-type statistic:
where the variance estimators are defined as:
We reject H₀ at significance level α = 0.05 when , where cα represents the critical value obtained via Monte Carlo simulation.
Theoretical results
Theoretical properties
We first establish the theoretical foundation of our approach through two key theorems.
Assumption 1. The time series is strictly stationary and ergodic with
for some
. The parameter estimators
and
are
-consistent, and the estimated residuals satisfy
.
Theorem 1. Under and Assumption 1, as
,
where ,
, and
are independent Brownian bridges on
.
In the proof below, and
will arise as the limiting processes of the partial sum processes before and after the candidate change point
, respectively. Their independence follows from the independence of the increments of the original Brownian motion
over disjoint intervals.
Proof of Theorem 1. Under , the estimated residuals
are approximately i.i.d. with mean zero and finite variance
. By the functional central limit theorem (Donsker’s theorem), the partial sum process satisfies:
where is a standard Brownian motion. The centered partial sum process converges to a Brownian bridge:
Now consider the numerator of . Let
, then:
Note that:
Therefore,
As ,
,
, and by the continuous mapping theorem:
For the denominator, consider the two components separately. For :
By the functional central limit theorem:
Here is a Brownian bridge on
.] Similarly, for
:
Here is another Brownian bridge independent of
.
By the continuous mapping theorem:
Therefore, the denominator converges to:
Combining numerator and denominator:
Taking the supremum over gives the desired result.
Theorem 2. Under and Assumption 1, for any fixed change-point
with
, as
,
Proof of Theorem 2. Under , there exists a change-point at
where the parameter vector changes from
to
. This induces a mean shift in the residuals:
where is a constant determined by the parameter change
.
For the numerator:
By the law of large numbers:
Therefore,
For the denominator, it remains bounded in probability:
This is because the denominator is a self-normalizer based on centered residuals, which converges to a finite integral of Brownian bridge processes.
Hence,
This completes the proof of consistency under the alternative hypothesis.
The proofs establish that our test statistic converges to a well-defined limiting distribution under the null hypothesis while diverging under alternatives, ensuring consistent detection power. Complete proofs are provided in S1 Text.
Results
Simulation studies
We conducted comprehensive Monte Carlo simulations to evaluate the finite-sample performance of our proposed method. Data were generated from AR(1) and ARMA(1,1) processes with sample sizes and 500, representing moderate and large sample scenarios commonly encountered in practice. All results are based on 1,000 replications at the 0.05 significance level.
In the following tables, “Size” refers to the empirical size (type I error rate) under the null hypothesis of no change, while “Empirical potential” refers to the empirical power (detection rate) under the alternative hypothesis with a single change point. The parameters ,
,
, and
denote the autoregressive coefficient, moving average coefficient, error variance, and mean shift magnitude, respectively.
Table 1 shows the empirical sizes and empirical power of the AR(1) model in different cases. The test statistic can well control the empirical size for different parameter values, and almost no empirical size distortion occurs. For example, at and
, the empirical power of the SVR-based self-normalization test statistic is 0.637, while the empirical powers of
and
are 0.523 and 0.484, respectively. From Table 1, the self-normalization test has better empirical powers than
and
in most cases. This result proves the effectiveness of our proposed SVR-based self-normalization test in approximating the critical values of the statistics.
Table 2 shows the empirical size and power of the ARMA(1,1) model in different cases. The test statistic usually well controls the empirical size for different parameters, which proves the effectiveness of the proposed self-normalization test method based on SVR in approximating the critical value of the statistic. For example, at ,
,
, and
, the empirical power of the SVR-based self-normalization test statistic is 0.636, while the empirical powers of
and
are 0.620 and 0.472, respectively. When the sample size increases the empirical powers of the statistics become better and closer to 1, which proves that the proposed self-normalization test method based on SVR outperforms
and
in most cases.
Table 3 summarizes the empirical size and empirical powers of the ARMA(1,1) model for different cases. The test statistic usually well controls the empirical size for different parameter values, which proves the effectiveness of the proposed self-normalization test method based on SVR in approximating the critical value of the statistic. For example, when ,
,
, and
, the empirical power of the SVR-based self-normalization test statistic is 0.884, while the empirical powers of
and
are 0.838 and 0.775, respectively. Thus, the proposed SVR-based self-normalization test method outperforms
and
in most cases and is effective.
We note that in scenarios where only the error variance changes (e.g.,
in Tables 1–3), the proposed method occasionally shows slightly lower power compared to changes in mean-related parameters (
). This is consistent with the nature of self-normalization, which standardizes by a scale estimate and may be less sensitive to pure variance shifts compared to mean-structure changes. This characteristic is known in the self-normalization literature and does not detract from the method’s primary strength in detecting mean-related structural breaks.
Tables 4 and 5 summarize the empirical sizes and empirical powers of the SVR-based self-normalization test statistic for different scenarios with different parameter values at different change point locations. The changes in parameters, sample size, and location of the change points significantly impact the empirical powers of the statistic. The SVR-based self-normalization test statistic has better empirical powers than and
under different conditions. The empirical power increases with the sample size, e.g., at
,
,
,
, and
, the empirical powers of the SVR-based self-normalization test statistic are 0.719, 0.985, and 0.927, while the empirical powers of
are 0.689, 0.857, and 0.925, and those of
are 0.691, 0.777, and 0.870, respectively. Thus, the test powers of this chapter are higher under the alternative hypothesis in most cases. The empirical powers of the SVR-based self-normalization test are higher in the middle position than at the two end positions, which indicates that it will be easier to detect when a change point appears in the middle of the sample. This result confirms our assumption that the change point relatively occurs in the middle position.
Empirical illustration
We applied our method to two real-world datasets to validate its practical utility.
Analysis of the annual Nile River flow data (1871–1970) yielded a test statistic value of 315.52, substantially exceeding the critical value of 32.81 (). The detected change point corresponds to known historical patterns in Nile River flow regimes, potentially reflecting climate variations or human interventions during the measurement period. The close correspondence between our results and established findings in hydrological literature [13] validates the method’s applicability to environmental time series (Fig 1).
The vertical dashed line indicates the detected change point at year 1898. The horizontal axis is labeled “Year” and the vertical axis is labeled “Discharge (m³/s)”.
Application to the Nikkei 225 index (2000–2021) produced a test statistic value of 70.25, again significantly exceeding the critical value (). The detected change points align with major financial events including the 2008 global financial crisis and COVID-19 market disruption. Comparison with established financial econometrics methods [14] showed strong agreement in change point identification, while our method offered the practical advantage of avoiding complex volatility modeling and parameter tuning (Fig 2).
The vertical dashed lines indicate the detected change points in 2008 and 2020. The horizontal axis is labeled “Year” and the vertical axis is labeled “Log Return × 100”.
Discussion
This study has introduced a novel change-point detection methodology that integrates Support Vector Regression with a self-normalization framework. Through theoretical analysis, comprehensive simulations, and empirical applications, we have demonstrated that our approach effectively addresses key limitations of traditional change-point detection methods while maintaining robust performance across diverse scenarios.
The primary methodological contribution of our work is the development of a unified framework that eliminates the need for long-run variance estimation through self-normalization while leveraging SVR’s flexibility to capture complex data patterns. This combination provides several practical advantages: it avoids bandwidth selection and other tuning parameters that often complicate traditional methods; it enhances capability to detect changes in complex, potentially nonlinear time series; and it provides a solid theoretical foundation for statistical inference.
Our simulation results demonstrate that the proposed method outperforms existing SVR-based alternatives in terms of both size control and detection power. The empirical applications to hydrological and financial data further validate the method’s practical utility across different domains. For researchers and practitioners working with time series data, our method offers a robust, computationally efficient tool for structural break detection that requires minimal manual intervention.
The current framework is designed for single change-point detection. Extending it to multiple change points presents both methodological and computational challenges, including increased search complexity and potential interference between adjacent breaks. Future work could explore sequential detection strategies, such as binary segmentation or wild binary segmentation, integrated with the SVR-self-normalization framework. Additionally, the method’s sensitivity to the trimming parameters and
was examined through supplementary simulations with alternative values (e.g.,
), which showed stable performance, confirming its robustness to reasonable parameter choices (see S1 Table for detailed results).
While the method demonstrates strong performance in univariate settings, many real-world applications involve multivariate time series. Developing a multivariate extension that can handle cross-dependent structures represents a valuable direction for future research. Integration with other machine learning techniques, such as deep learning or ensemble methods, could further enhance detection capability in high-dimensional or nonstationary environments.
In conclusion, the SVR-self-normalization approach provides a robust, theoretically sound, and practically useful framework for detecting structural change points in time series. The method’s strong performance across diverse scenarios, combined with its practical advantages in implementation, suggests broad applicability across scientific disciplines where reliable change-point detection is critical.
Supporting information
S1 Text. Proofs of Theorems 1 and 2.
Complete derivations and proofs for the theoretical results presented in Section 2.6 (Theoretical results) of the main text, establishing the asymptotic distribution of the test statistic under the null hypothesis and its consistency under the alternative.
https://doi.org/10.1371/journal.pone.0340729.s001
(DOCX)
S1 Table. Empirical size and power of the SVR-SN test under alternative trimming parameters
. Results of sensitivity analysis examining the robustness of the proposed method to different trimming parameter choices. Empirical sizes and powers are reported for ARMA(1,1) models with n = 200 and 500, based on 1,000 Monte Carlo replications at the 0.05 significance level.
https://doi.org/10.1371/journal.pone.0340729.s002
(DOCX)
S1 Code. Python implementation of the SVR-SN algorithm.
Python scripts implementing the proposed SVR-SN change-point detection algorithm, including functions for data simulation, model fitting, test statistic calculation, Monte Carlo simulations (reproducing Tables 1–5), and applications to real-world datasets (Nile River and Nikkei 225).
https://doi.org/10.1371/journal.pone.0340729.s003
(DOCX)
Acknowledgments
The authors sincerely thank Professor Zhanshou Chen for his invaluable guidance and support throughout this research. We also appreciate the Editor, anonymous reviewers, and editorial staff of PLOS ONE for their time, valuable comments, and professional assistance during the review process.
References
- 1. Jiang F, Zhao Z, Shao X. Time series analysis of COVID-19 infection curve: A change-point perspective. J Econom. 2023;232(1):1–17. pmid:32836681
- 2. Ye W, Miao B, Ma Y. Contagion analysis of the U.S. subprime debt crisis based on variation point detection of hazard rate function. Systems Engineering—Theory & Practice. 2010;30(3).
- 3. Huang X, Tian Z, Qin R, Wang X, Zheng C. Real-time variation point detection of meteorological data based on hybrid model adaptive LASSO method. Journal of Guangxi University for Nationalities (Natural Sciences Edition). 2021;27(2).
- 4. Page ES. Continuous Inspection Schemes. Biometrika. 1954;41(1/2):100.
- 5. Lee S, Kim CK, Lee S. Hybrid CUSUM Change Point Test for Time Series with Time-Varying Volatilities Based on Support Vector Regression. Entropy (Basel). 2020;22(5):578. pmid:33286350
- 6.
Csörgő M, Horváth L. Limit theorems in change-point analysis. John Wiley & Sons; 1997.
- 7. Inclán C, Tiao GC. Use of Cumulative Sums of Squares for Retrospective Detection of Changes of Variance. Journal of the American Statistical Association. 1994;89(427):913–23.
- 8. Truong C, Oudre L, Vayatis N. Selective review of offline change point detection methods. Signal Processing. 2020;167:107299.
- 9. Smola AJ, Schölkopf B. A tutorial on support vector regression. Statistics and Computing. 2004;14(3):199–222.
- 10. Shao X. A simple test of changes in mean in the possible presence of long-range dependence. Journal of Time Series Analysis. 2011;32(6):598–606.
- 11. Francq C, Zakoïan J-M. Maximum likelihood estimation of pure GARCH and ARMA-GARCH processes. Bernoulli. 2004;10(4).
- 12. Lee S, Lee S, Moon M. Hybrid change point detection for time series via support vector regression and CUSUM method. Applied Soft Computing. 2020;89:106101.
- 13. Cobb GW. The problem of the Nile: Conditional solution to a changepoint problem. Biometrika. 1978;65(2):243–51.
- 14. Bai J, Perron P. Estimating and Testing Linear Models with Multiple Structural Changes. Econometrica. 1998;66(1):47.