Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Estimating endogenous treatments effects under long-range dependency without untreated controls

Abstract

The identification and estimation of social policy effects through time‑series natural experiments is fundamental in modern econometrics. However, challenges come from the heterogeneities caused by staggered treatment adoptions and the endogeneities caused by omitted variables. In this paper, we propose a novel method to identify and estimate staggeringly adopted endogenous treatments effects with several treatments when there is no available (suitable) untreated unit and instrument variable. First, we propose a conditional mean symmetry condition by projecting the potential outcomes onto a sub-linear space spanned by the proposed common proximal variable. Under this condition, we can rule out confounding biases. Second, a proposed weak index restriction constructed by Bernstein expansions satisfying conditional mean independence property enables us to consistently estimate multiple heterogeneous treatments effects, and the proposed estimators are robust to weak common proximal variable. We show that the asymptotic distribution of the step-wise estimator is a fractional Brownian motion process with long range dependency. Third, we propose a bootstrap procedure to circumvent the inference difficulty brought by time-series dependencies. Monte Carlo simulations show that our proposed estimator and inference framework work well in small samples, and our contributions are further illustrated by an empirical example with unilateral divorce law reforms.

Introduction

This paper considers the identification and estimation of the following treatments effects panel model:

(1)

yit is a scale response variable for unit i at time t, i.e., some observed social-economic outcome; Sit is a treatment taking place at time tSi and driven by the confounder Wit, Dit is another treatment taking place at time tDi and driven by the confounder Xit, e.g., the treatments (Sit, Dit) may be some economic stimulate policies promoted by the governments while the confounders (Wit, Xit) may be some social-economic factors driving the growth of yit and the stimulation of the policies (Sit, Dit) [14]. We assume that Wit and Xit are all of dimension one. Zit are dZ dimensional exogeneous control variables which are independent of the treatments (Sit, Dit) and the model error . and are individual fixed effect and time fixed effect. The parameters of interest are the heterogeneous treatment effects and . We consider three main challenges that commonly arise in empirical studies:

  1. The endogeneity problem. We assume that the confounders (Wit, Xit) are unobservable. Meanwhile, we relax the Gauss-Markov Theorem to allow the treatments to be correlated with the error term: for , which means that even if the confounders are observed and included in model (1), the model is still endogeneous. In empirical studies, Instrument Variables (IVs) approaches are considered to be an effective way to handle the problem under this scenario [5], however good quality IVs are hard to be found [68]. Hence, we directly assume in this paper that no suitable IVs are available for empirical analysis.
  2. The heterogeneity problem. On the one hand, as shown in model (1), all parameters are time-varying and unit-varying (two-way heterogeneous), which implies that treating the treatments effects as constant will lead to false estimations of ATE and ATT [911]. On the other hand, we further assume that the treatments (Sit, Dit) are staggeringly adopted, i.e., there exist at least two such that and . It is well-known that regional policy evaluation approaches such as Two-Way Fixed Effect Estimations (TWFE), Synthetic Control Methods (SCM) and Difference in Difference (DID) among others will lead to false estimations under this scenario [1219], e.g., the contamination bias [20].
  3. The absence of untreated units. When the social-economic policies (Sit, Dit) are one-size-fit-all, e.g., the government’s policies are carried out nationwide, no untreated units are available for identification and estimation of the treatments effects. Regional policy evaluation approaches such as TWFE, SCM and DID among others are no longer applicable under this scenario. Empirical researches tend to find or construct untreated units from historical data or cross-national data (e.g., [21]), however the validities are suspectable. To say the least, even if the polices are not one-size-fit-all, we are stilling facing the problem of selecting “good-quality” untreated units [22]. In fact, adopting unsuitable untreated units will lead to biased estimations [23,24].

Although these three problems are common in empirical studies, the methodological literature handling these three dilemmas simultaneously as far as we know is in straitened circumstances. Most recent discussions have paid attentions to the heterogeneous problem [20,25], however if untreated units are absent, the proposed methods are not applicable. Even if untreated units are present, estimating model (1) by the proposed methods with the endogeneity problem will also lead to biased estimations. How to identify and estimate heterogeneous treatments effects with endogeneity and staggeringly adopted treatments in untreated units and IVs absent time-series natural experiments remains an unsolved problem.

To overcome these difficulties, a novel method is proposed. We propose a Conditional Mean Symmetry condition by projecting the counterfactual outcome onto the sub-linear space spanned by the proposed Common Proximal variable, wherein the Common Proximal variable is of high correlation with the unobservable confounders and independent of the error term. The Conditional Mean Symmetry condition is, on the one hand, shown to be equivalent to the Weak Index Restrictions used in the identification of semiparametric models [26,27]; on the other hand it is also equivalent to the matching condition used in observation studies which rules out confounding biases (see, e.g., [2831]). Our framework is built on Professor Halbert White and colleague’s pioneering works [3234], and we further illustrate the relationships between Granger (G-) causality, Sims causality and structural causality in untreated units absent time series natural experiments [35], as well as how we can disentangle one treatment effect from another through a forward point-wise estimation approach based on the proposed Conditional Mean Symmetry condition.

We contribute to the recent emerging literature on heterogeneous treatments effects in event studies with variations in treatments timings in two main fronts. First, we allow non-stationary potential outcomes. Stationarity is usually required in time-series natural experiments to guarantee valid inference [10,23,36,37], however social-economic outcomes are always non-stationary, e.g., trend stationary processes. Causal inference with non-stationary outcomes is somehow nettlesome. In view of this, our proposed bootstrap inference framework allows the potential outcomes to be non-stationary processes, e.g., the potential outcomes are trend-stationary processes with long-range dependency, where the deterministic trend captures the long-term growth and the stationary component follows a long memory model. As far as we are concerned, this is the first time we consider non-stationarity with long range dependency in time-series natural experiments.

Second, the Conditional Mean Symmetry identification assumption proposed in this paper presents an innovative identification paradigm that extends causal inference to empirical settings—such as national policy evaluations—where the DID framework is not applicable due to the absence of a valid comparison group. If there is no untreated unit, [3234,38,39] and [40] have considered the possibility of using only treated units to estimate causal effects. However, they did not consider scenarios involving multiple endogenous treatments and long-range dependence, both of which are common in real-world data. Our approach fills this gap.

The rest of the paper is organized as shown in the roadmap Fig 1. The second part introduces the identification assumptions and describes how we can identify multiple treatments effects without untreated units under the Neyman-Rubin counterfactual framework. The estimation procedures and inference method for a single (univariate) time series model are gathered in the third part, and the fourth part extents the proposed methods to handle panel (multivariate) settings. The fifth part reports the results of the numerical examples and the final part concludes. All the proofs and additional analysis are collected in the Online Supplementary Materials S1 File-S5 File. To facilitate the use of our method, we provide an open-source R package endoLTE at https://github.com/codescollection/endoLTE that implements all estimators and inference procedures proposed in this paper.

thumbnail
Fig 1. Roadmap of identification, estimation and inference.

https://doi.org/10.1371/journal.pone.0347847.g001

Identification

We will start our identification and estimation approach by a single (univariate) time-series model with homogeneous treatments effects. Then, we will show how the proposed approach can be extended to the heterogeneous settings and staggered panel settings, as shown in model (1).

Global identification

Consider the following 4 specifications:

(2)(3)(4)(5)

(St, Dt) are treatments with the confounders (Wt, Xt), Zt are exogenous control variables. DGP1 assumes that the confounder Wt is correlated with the treatment Dt, i.e., , while DGP2 assumes that Wt is independent of Dt. DGP3 and DGP4 respectively describes a single treatment effect model.

The Directed Acyclic Graphs (DAGs) for DGPs (2)(5) are shown in Fig 2, corresponding to the adjacency matrices .

thumbnail
Fig 2. DAGs for the treatments effects models (2-5) (DGPs 1-4) with corresponding adjacency matrices.

https://doi.org/10.1371/journal.pone.0347847.g002

Definition 1 (Pseudo-subspace). If we let denote the n-th column of the matrix , then there exists a pseudo-subspace spanned by : for all nonzero constants , . We denote this pseudo-subspace as .

Assumption 1 (Rank condition) In models (2)–(5), for the random vectors , , , , and the matrix , denote for DGP1, for DGP2, for DGP3 and for DGP4. x1 and x2 are matrices of dimension , x3 is of dimension and x4 is of dimension T × 2. Then we have , , and .

The difference between the usual subspace and the pseudo-subspace defined in Definition 1 is that we do not allow kn = 0, so one can see from the adjacency matrices that while . Assumption 1 requires , we do not allow treatments take place at the same time, e.g., the governments’ policies are carried out at the same time. In fact if tS = tD, treatments would be mixed with each other, making it impossible to distinguish one from another.

Proposition 1. If Assumption 1 is satisfied, then the treatment effect and the treatment effect can be separately identified from each other under DGP2 with the adjacency matrix but not under DGP1 with the adjacency matrix .

Why and how it is possible to identify treatment effect in untreated units absent time-series natural experiments

As shown in Proposition 1, model (2) is unidentifiable, hence only the following treatments effects model (6) is identifiable. We now start with this univariate time-series model with homogeneous coefficients, and show how the estimation and inference procedure could be extended to handle heterogeneous panel settings latter.

(6)

where ,   Zt are exogenous control variables independent of and . The parameters of interest are , but (Dt, St) are endogenous due to the unobservability of (Wt, Xt) and . Instead of finding IVs for the endogenous treatment variables Dt and St, we propose to find (or construct) a Common Proximal Variable satisfying the following Assumptions 2–3 and Definition 2. On this basis, for model (6), we consider the Bernstein expansions of the random variables , and on the Common Proximal Variable of order 2 [41]:

(7)(8)

where , , , are called Bernstein coefficients and the expansions (7)–(8) we get are called Bernstein polynomials of on , where . Note that the first order derivative of y(t) and with respect to are

Assumption 2 (i) The random variables W and are of high correlation, X and are also of high correlation, e.g., is a proximal variable for W and X. (ii) is independent of , and

Assumption 3 There exists a continuous function , p ≥ 1, such that: (i) , , , Cp is the Lebesgue space with defined on , ut, wt are martingale processes independent of and E(u) = 0, E(w) = 0. (ii) For the Bernstein expansions, (W,X) and are of high correlations respectively, where . (iii) The composite error is independent of and Zt, where .

Definition 2 (Common Proximal Variable). For model (6), a random variable satisfying Assumptions 2–3 is called a Common Proximal Variable (CP).

Remark 1 (Doob decomposition theorem). For most of the social-economic outcomes yt, if yt is changing with time, i.e., yt is a function of t, as we defined in (7); or if there are theoretical and empirical evidences supporting that Wt, Xt and Zt are changing with time, we would then take satisfying Assumptions 2–3 without loss of generality. Particularly, if both and are sub-martingales for , and we have

(9)

where is a continuous and non-decreasing function, the coefficients are real non-zero numbers , E(u)=E(w)=0, then (9) is exactly the famous Doob-Meyer decomposition: the sub-martingales Wt and Xt can be decomposed into a non-decreasing process plus martingales ut and wt respectively by Assumption 3(i).

Following [42] and [43], we let denote the potential outcome under different realizations of , so we observe for 1 ≤ t < tS, for and for tD ≤ t < T. Then yt in (6) can be rewritten as

(10)

for any . for then denotes the counterfactual outcome if there is no treatment St; for denotes the counterfactual outcome if there are no treatments (St, Dt); for denotes the counterfactual outcome if there is no treatment St [44,45].

Definition 3 (Sub-sample expectation operator on the Hilbert space). Let be the Hilbert space generated by the random process , contains all the functions for and . Define a family of operators on as , is the size of the set ℓ. Then is called the sub-interval expectation operator on the sub-space .

Definition 4 (Mean symmetry and conditional mean symmetry). For any random variable , if for where , , then we say that x satisfies mean symmetry. The conditional mean symmetry is then defined by projecting x onto another random variable such that .

Assumption 4 (Long range dependency) In model (6), as for and , the parameter is referred to as the Hurst parameter, is stationary. Wt, Xt and Zt are stationary or trend stationary processes, hence yt is a stationary or trend stationary process with long range dependency.

Assumption 5 (Conditional Mean Symmetry, CMS) There exist series of random sequences and such that: (i) conditional parallel: , where is the counterfactual outcome sequence; and (ii) conditional mean symmetry: , where , , .

If we let be a martingale process with respect to the -filtration , r ≤ t, then satisfies Assumption 5(i). This is because and , hence we can get Assumption 5(i) with . Under this scenario, Assumption 5(ii) is also satisfied. To see this, suppose that there exists an affine function such that . Note that if is a stationary process, by the von-Neumann Ergodic Theorem (see [46,47]), define and , we can get

which exactly implies Assumption 5(ii) with and . Assumption 5 indicates that is an ergodic but non-mixing process on the Hilbert space [48]. This implies that the random sequence could be chosen by satisfying the condition that is a martingale process with , e.g., the random sequences could be the untreated units as shown in Fig 2 (grey solid lines). This remind us that, in time-series natural experiments, one of the criterions of judging whether the selected untreated units are “good enough” is that the difference between the counterfactual outcome and the untreated unit should be a martingale process over the whole time interval.

For time series, structure change (break) is usually defined as a before-after change, i.e., there is a spontaneous trend, slope, structure or regime change before and after some point of time; while treatment effect is defined as a with-without change, i.e., the growth path of the observed yt with Dt (or St), ceteris paribus, would be totally different without Dt (or St). Hence treatment effect is defined in a counterfactual way, while structure change or other distribution shifts effect is not [2830,49,50]. This is also the main difference between Granger Causality, Sims Causality and the Neyman-Rubin counterfactual framework (e.g., [35,38,51,52]). Fig 3 illustrates the differences through hypothetical DGPs. It can be seen that without further restrictions (assumptions), structure change has no direct relationship with treatment effect. A (non-)zero treatment effect does not necessarily imply a (non-)zero structure change, and vice versa. Empirical researchers may hold the intuition that non-zero treatment effect implies non-zero structure change, as illustrated in Fig 3c. However, this is not always the case, see Fig 3b and (d) for examples.

thumbnail
Fig 3. Comparisons between structure change and treatment effect through hypothetical examples (a-d).

https://doi.org/10.1371/journal.pone.0347847.g003

To make the interpretations and credibility of our assumptions clear and transparent, an intuitive illustration of Assumption 5 is further given in Fig 4 and 5, it can be seen that the random sequences and will not satisfy Assumption 5 (the green dots do not equal to each other in Fig 4) until we project them on to the space spanned by (the red dots equal to each other), where we take . Note that in this paper we allow the observed outcome to be non-stationary (e.g., trend stationary process), while the literatures usually require (conditional) stationary for identification and inference in time-series natural experiments, see, e.g., [53,54].

thumbnail
Fig 4. Hypothetical illustration example of Conditional Mean Symmetry (Assumption 5).

the middle panel shows Assumption 5 (i), the right panel shows Assumption 5 (i) and (ii). The blue solid line shows the original (before-projection) counterfactual outcome , the black solid line shows the original (before-projection) random sequence , while the blue dashed line shows the after-projection counterfactual outcome and the black dashed line shows the after-projection sequence . The red solid line shows the difference of before-projections , while the red dashed line shows the difference of after-projections . The green dots show the sub-sample means of before-projections: and , while the red dots show the sub-sample means of after-projections: and , , , . The vertical line shows the time when the treatment takes place.

https://doi.org/10.1371/journal.pone.0347847.g004

thumbnail
Fig 5. After-projection outcomes (left panel), and distributions of DGPs (b) and (c) shown in Fig 3 (right panel).

We take . For the left panel, the dark solid line shows the after-projections of the treated outcome , the red dashed line shows the after-projections of the untreated outcome , and the grey solid line shows one possible realization of . For the right panel, the red solid line shows the distribution of the untreated outcome (after-projection) for the pre-treatment period , while the red dashed line shows the distribution of the untreated outcome (after-projection) for the treatment period . The vertical line shows the time when the treatment takes place. For expositional convenience, we only consider model (6) with single treatment St.

https://doi.org/10.1371/journal.pone.0347847.g005

Remark 2 (Matching condition) Assumption 5 rules out confounding biases caused by (Wt, Xt) and guarantees what identified are pure treatments effects. To see this, consider the DGPs (a-d) shown in Fig 3, the after-projections of the potential outcomes are shown in Fig 5 respectively. It can be seen that the DGP (b) or (d) does not satisfy Assumption 5(ii) (Conditional Mean Symmetry fails down due to the fact that there exists a structure change), and neglecting the distribution shifts shown in the right panel of Fig 5(a) will lead to false positive identification results, i.e., the identified effect stems from structural shifts, which has been misleadingly interpreted as evidence of a treatment effect; or false negative mistake, just as shown in DGP (b) or (d) of Fig 3 which satisfies Assumption 5(i) but not (ii).

Assumption 5 implicitly implies that the treatment effect can be identified if and only if the distribution of the untreated outcome (after-projection) for the pre-treatment period shares a common support with that of the treatment period (Fig 3a and (c) shown in Fig 5b). As stressed by [28] and [32], reliable estimates obtained when the support for the treated regime coincides with or is a subset of that for the non-treated regime, and not necessarily otherwise. This is the core Matching Condition used in observational studies (e.g., [28,29]). Assumption 5 then permits us to avoid confounding the effect of treatment with shifts in the distribution of other causes.

Remark 3. Recall the general panel data model , the time-homogeneity condition assumes that for all , where Bit is the explanatory variable and denotes distributional equivalence, the disturbance is then conditional strict stationary [55,56]. When , the time-homogeneity condition, under our scenario (6), is equivalent to the assumption that the conditional distribution of the counterfactual untreated outcome given (Xt, Wt) does not change before and after the treatment, as shown in the right panel of Fig 5(b). However, Assumption 5 does not necessarily imply the time-homogeneity condition, so Assumption 5 is less restrictive. Hence, with the distribution of factors other than Dt (or St) not varying over time, the changes in Dt (or St) over time can help identify the ceteris paribus effect of Dt (or St) on yt, this is exactly why and how we can achieve causal inference in time-series natural experiment through Assumption 5.

Assumption 6 (SUTVA and no anticipation) We assume stable unit treatment value (SUTVA) and no anticipation effect exist.

Theorem 1 (Identifications). For model (6) with , , and : (i) Before-After (structure change effect): if Assumption 5(i) is satisfied, then we have

(ii) With-Without (treatment effect): if Assumptions 5(i)-(ii) are both satisfied, we will then get

The Before-After part (i) identifies the structure change effect, while the With-Without part (ii) identifies the treatment effect.

We are now able to give a formal distinction between treatment effect and structure change effect:

Definition 5 (Causal field and multiverse). For model (6), the random processes triplets (yt, Dt, Xt) is called a causal field where E(y|D,X) = E(y|X) and ; while the random process triplets (yt, Dt, Wt) is called a structure change field where and . Similarly, the triplet (yt, St, Wt) is a causal field and the triplet (yt, St, Xt) is a structure change field. For the pairs (yt, St) and (yt, Dt) without further information, we cannot tell whether they are causal fields or structure change fields. The quadruplets (yt, Dt, St, Xt) is called a time-bowl (the quadruplets (yt, Dt, St, Wt) is also a time-bowl), the fields under all time-bowls constitute a multiverse:

The interesting time-bowl and multiverse concepts are borrowed from the 2023 DC movie The Flash. The time-bowl corresponds to a specific universe and the multiverse is constructed by different specific universes, the Flash Man (the protagonist of the movie) can enter any time-bowl from the multiverse by his superpower. I’m inspired by these originalities, we econometricians are just like the Flash Man whose superpower are different model assumptions under which we can disentangle one time-bowl (e.g., causal field) from another time-bowl (e.g., structure change filed).

It can be seen from Theorem 1 and Definition 5 that the parameters of interest in model (6) have different meanings under different assumptions. The before-after part under Assumption 5(i) identifies a structure change effect, while the with-without part under Assumptions 5(i)-(ii) identifies a causal effect. This finding implies that the parallel assumption alone in DiD settings is not sufficient enough to disentangle treatment effect from structure change effect. The presence of a non-zero estimated effect under the parallel trends assumption does not necessarily imply a causal treatment effect. An alternative explanation is that the treated units were undergoing a spontaneous structural change, while the untreated units were not. Under Assumption 5, a (non-)zero treatment effect implies a (non-)zero structure change as shown in Fig 3(a) and (c), but not vice versa as shown in Fig 3(b) and (d). In observation studies, the triplet (yt, Dt, Xt) or (yt, St, Wt) describes an “economic story”, which ascribes a causal meaning to the empirical findings. Without such a “story” (the relationship between yt and Dt (St) given Xt (Wt)), we can hardly claim that what we finally get is actually a causal relationship.

The plausibility of the CMS assumption

To justify the Conditional Mean Symmetry (CMS) assumption and demonstrate its plausibility, we illustrate its logic step-by-step using a specific economic example: the impact of a national minimum wage policy on employment. Suppose a national minimum wage policy (the treatment variable Dt) was enacted on January 1, 2000. We want to evaluate the causal effect of this policy on the youth employment rate (yt). This is a classic time-series natural experiment with no untreated units, as the entire country is affected.

The challenging we are facing now is that the employment rate is influenced not only by the policy but also by numerous unobservable confounding factors, such as: (i) Macroeconomic cycles: Employment is high during economic booms and low during recessions. (ii) Industrial structure adjustments: An increase in the service sector’s share and a decrease in manufacturing also affect overall employment. (iii) Demographic changes: Changes in the proportion of the youth population. These factors simultaneously influence whether the policy might be considered (as economic pressures form the backdrop for policy enactment) and the employment outcome.

Step 1: Finding the “Common Proximal Variable (CP)” . According to Remark 1, under specific economic scenarios, we can take (the time trend) as the CP variable. The rationale for this choice is that the confounding factors (economic cycles, industrial structure, demographics) all change systematically over time. By Assumptions 2–3, we can use a function of time (e.g., ) to roughly capture these trends, i.e., we assume that the macroeconomic condition Wt can be decomposed as , and the industrial structure Xt can be decomposed as . Here, is a smoothly varying function of time, while ut and wt are short-term random fluctuations around this trend and are independent of the error term .

Step 2: Applying CMS(i) (Conditional Parallelism). In our example, CMS(i) implies that yt(0) is the employment rate “if there were no minimum wage policy”. is the “shadow variable” used for adjustment. It can be understood as the part of the outcome highly correlated with the CP variable . We can simplify this to the part of the employment rate explained by the time trend. here can be constructed from .

The economic meaning of CMS(i) is: After removing the part explained by the time trend (and the confounding factors associated with it) , the behavior (e.g., its expected value) of the remaining “pure random shock” part is the same before and after the policy implementation. In layman’s terms: Without the policy, employment would fluctuate due to economic cycles. CMS(i) assumes that the “pattern” of these fluctuations is the same before and after the policy. For example, during an economic upturn, employment would be x% above the trend line; during a downturn, it would be y% below the trend line. This “pattern of deviation from the trend” is stable.

Step 3: Applying CMS(ii) (Mean Symmetry). In our example, can be roughly understood as the “potential employment level driven by the long-term trend”. CMS(ii) assumes that: The employment level determined by the long-term trend itself did not undergo a structural break around the time of the policy. That is, we cannot say that the “long-term trend line” itself had a discontinuity in its slope or intercept right after January 1, 2000, compared to before. The trend line is continuous and smooth across the policy implementation point.

In this specific example, the plausibility of the CMS assumption depends on judgment of the economic context:

  • Plausibility of CMS(i): The assumption that the “pattern of deviation of employment from its long-term trend” is stable is a common hypothesis in macroeconomics. Business cycles (boom-recession-recovery) do exhibit a certain regularity and cyclicality [2]. As long as there are no other major structural changes (e.g., war, technological revolution), this pattern of fluctuation around the trend is likely to be similar.
  • Plausibility of CMS(ii): The assumption that the “long-term trend itself” did not have a structural break around the policy implementation is precisely what we need to verify. The purpose of the minimum wage policy is to change the employment level, so it could very well cause a change in the intercept or slope of the trend line. CMS(ii) essentially states that the “trend” part captured by does not include the effect of the policy. The entire policy effect is intended to be captured by the parameter we ultimately estimate, .

The CMS assumption is a stronger assumption than the parallel trends assumption used in DiD settings in that it attempts to substitute cross-unit comparability with within-unit stability over time; however, it is also more flexible because it does not require the existence of a unit unaffected by the treatment.

Empirical manifestations of CMS violations

Understanding how CMS assumptions would be violated in real-world data is crucial for applied researchers. Below, I describe what empirical violations would look like for each component of the CMS assumption, using concrete economic examples.

Violations of CMS(i):.

  • Pattern 1: Time-varying volatility in residuals. Imagine plotting the residuals (where is the projection onto the CP variable) over time. A violation of CMS(i) would show: For the pre-treatment period, residuals fluctuate within a stable range (e.g., ±1%); while for the post-treatment period, the pattern of fluctuations changes fundamentally—perhaps they become more volatile, or the auto-correlation structure changes. In the minimum wage example, suppose before 2000, employment deviations from trend were mild and short-lived (2–3 quarters). After 2000, these deviations become persistent and last 5–7 years. This suggests the underlying economic dynamics have changed, violating the “same fluctuation pattern” assumption.
  • Pattern 2: Structural break in the relationship with the CP. Another violation occurs if the relationship between the counterfactual untreated outcome and the CP variable changes over time. This would appear as: For the pre-treatment period, follows a function ; however for the post-treatment period, the function shifts to . Consider the divorce law reform example from the later empirical analysis part of the paper. If the relationship between divorce rates and time (the CP variable) fundamentally changed after the 1997 Asian financial crisis—not just a level shift but a change in how divorce responds to economic conditions—this would violate CMS(i). The “adjustment” that worked before the crisis would no longer be appropriate after.

Violations of CMS(ii):.

  • Pattern 1: Level shifts in the projected component. The series shows a discrete jump exactly at the treatment time that cannot be explained by the treatment itself. In the minimum wage case, suppose captures the “long-term trend” in employment driven by demographics. If a new demographic survey methodology was introduced in 2000 that redefined how youth population is measured, the trend line would shift mechanically—violating CMS(ii) even if the minimum wage policy had no effect.
  • Pattern 2: Slope changes in the projected component. The rate of change of shifts at the treatment boundary. In the divorce law example, suppose the CP variable is time t, capturing the natural upward trend in divorce rates due to secular social changes. If the 1997 financial crisis permanently accelerated the pace of social change (making people more individualistic faster), then the slope of would increase after 1997. This would violate CMS(ii) because the used to adjust pre-treatment data would not be appropriate post-treatment.
  • Pattern 3: Correlation between and the treatment indicator. becomes correlated with Dt or St after controlling for . This would appear as: In a regression of on Dt and , the coefficient on Dt is statistically significant. This indicates that the “pure trend” component is actually contaminated by the treatment itself.

In summary, the violations of CMS(i) come from: units learn from the treatment and respond differently to subsequent shocks; the treatment fundamentally alters how the economy/society responds to other factors; the set of units changes over time (in panel settings); and treatment effects spill over to affect the untreated state. The violations of CMS(ii) would come from: other policy changes or shocks coinciding with the treatment; the treatment itself alters the long-term trends it was supposed to be independent of; and selecting a proximal variable that is itself affected by the treatment.

Diagnose of the CMS assumption

When untreated units (control units) are present, the random sequence could be selected as the untreated units, and the CMS assumption could be tested by the empirical process-based methods proposed by [57], and apply them to our constructed residual term on the pre-treatment period where we assume . Truncate the sample to the period and (the “faked pre- and post-treatment” interval), if the estimated pseudo effect or is significantly nonzero, then CMS is not satisfied. However, when untreated units (control units) are absent, CMS could not be tested directly. Two approaches can be adapted to construct possible indirect tests for CMS:

  1. Placebo tests using a “fake treatment time.” We know no actual treatment occurred on the “faked pre- and post-treatment” interval and , so the true treatment effect should be zero. Apply the paper’s two-step estimation procedure to this hypothetical interval to estimate the “treatment effect.” If we obtain estimates that are statistically significantly different from zero, this provides strong evidence that the CMS assumption (especially CMS(i)) might be violated. Because even in the absence of treatment, our method falsely identifies an “effect,” suggesting the projection mechanism used to construct the counterfactual is biased.
  2. Testing implications of Conditional Mean Independence (CMI). Theorem 3 of the paper states that the proposed two-step estimation procedure implies conditional mean independence: . This means that, conditional on , the treatment variable St should be unrelated to potential outcomes. While potential outcomes are unobservable, we can test whether St is related to some functions of . In the sample including the true pre-treatment period () and the post-treatment period (), we can test whether is constant. A non-constant relationship would indicate a violation of CMI and thus CMS.

Choice of the CP variable.

  1. (i) should be a sufficient statistics of the confounders or should be of high-correlation with the confounders. This guarantees that the CP variable is not weak.
  2. (ii) should be independent of , which guarantees that the CP variable is valid.

These two requirements imply that for trend stationary social-economic outcome yt, the CP could be selected as (i) time t; or (ii) common factor ft extracted from , where Zit are multiple macroeconomic variables (such as GDP, consumption, and investment across regions) correlated with yt but independent of the policy, then is a good CP variable as a measure of the “economic cycle”.

Researchers should select CP variable by integrating economic theory with the context of the research question. Table 1 provides some concrete examples.

thumbnail
Table 1.

Candidate Common Proximal variables for typical empirical settings.

https://doi.org/10.1371/journal.pone.0347847.t001

Time is a natural and convenient CP because many confounders trend over time. However, there are several scenarios where time fails to satisfy the CP assumptions: (i) Treatment-induced structural break in the trend: If the treatment itself alters the long-term trajectory of the confounders (or the outcome’s relationship with time), then using a simple time trend violates CMS(ii); (ii) Non-linear or non-smooth confounding: If confounders evolve in a way that cannot be captured by a smooth function of time (e.g., abrupt shifts due to wars, natural disasters, or technological breakthroughs), a simple time trend will not proxy them well. The residuals ut and wt would then be correlated with time, violating the martingale difference assumption; (iii) Multiple confounders with different trends: When there are several confounders with different time profiles (e.g., one increasing linearly, another following a U-shape), a single time trend cannot simultaneously capture all. The CP would be misspecified; (iv) Seasonality or high-frequency fluctuations: In quarterly or monthly data, time may not capture seasonal patterns or business cycle frequencies that act as confounders. In such cases, a more sophisticated CP (e.g., a business cycle indicator) is needed; (v) When the treatment timing coincides with another aggregate shock: If the treatment date coincides with a major event (e.g., a financial crisis), the time trend alone cannot distinguish between the treatment effect and the crisis effect. The crisis would appear as a structural break in the trend, violating CMS(i) and (ii).

Multiple CPs are admissible, and researchers can attempt to select the “best” one—or a combination of them. However, within the specific CMS framework of this paper, the theory is developed for a univariate CP . If multiple CPs are available, the researcher faces a model selection problem: choosing which variable (or which combination) best satisfies the CMS assumption. This can be left for future research.

Estimation by weak index restrictions and inference with long range dependency

Substitute (9) into model (6) and directly estimate model (6) will not be correct due to the fact that , . In this section, we propose a novel method to deal with this issue without IVs.

A two-step estimation approach by weak index restrictions.

Step 3.1. (Estimation of ) Based on the Bernstein expansions (7)–(8), we will get

By symbolic computations, see [58,59], there exists a continuous function such that the relationship between and satisfies:

is the Bernstein coefficient defined in (7)–(8) for and . Estimate (7)–(8) by OLS to get the estimators respectively for b and , and input all the estimators into the equation , we will get

Then further consider the following auxiliary regression:

(11)(12)(13)(14)(15)(16)

W is an unobserved random shock. Like Indirect Inference [60], we do not require to be consistent for B in the auxiliary regression (11). Substitute (6) into , we will then get

for , and , the last equality holds true () by (6) and Assumption 2 (ii). Notice that by Assumption 3, model (6) could be rewritten as

(17)

where , , , thereby we can get

Estimate (17) by semiparametric methods, under Assumption 2 (ii) we will get and as , see, e.g., [61]. Substitute these estimators into , we will finally get

Assumption 6 As , we have (i) ;

  1. (ii) and ;
  2. (iii) , , , and ;
  3. (iv) , .

Theorem 2. (UNBIASEDNESS AND CONSISTENCY) Under the Assumptions 1–5 and Assumption 6, , and .

STEP 3.2. (ESTIMATION OF ) The estimation approach is similar to Step 3.1 except for replacing the time interval in (11) with , and replacing with . The estimator we finally get is

where , and , with , , , , and . The coefficients are

Under the Assumptions 1–5, by Theorem 1 and similar to Theorem 2, we can get , and as long as conditions similar to these in Assumption 6 are satisfied, i.e., (i) ;

  1. (ii) and ;
  2. (iii) , , , and ;
  3. (iv) , .

The following statements clarifies why and how the above proposed estimation approach satisfies Assumption 5 and guarantees Theorem 1:

Theorem 3. (CONDITIONAL MEAN INDEPENDENCE, CMI) For the 2-steps estimation approach, we have , where for Step 3.1; and , where for Step 3.2.

Interestingly, Theorem 3 indicates that given , which is constructed by the CP , the treatments (St, Dt) are nearly randomly assigned, which is a weak version of the unconfoundness condition for .

Remark 4. (WEAK INDEX RESTRICTIONS) Theorem 3 is natural in the sense that the CP is an index function of the confounders (Wt, Xt) by Assumptions 2–3. The conditions and , in which is the error term defined in model (17), are the well known weak or mean index restrictions widely used in semiparametric identifications ([6264]). Under weak index restrictions, the treatments effects can be uniquely identified and consistently estimated even if untreated units and IVs are not available.

Remark 5. (CMI IMPLIES CMS) To see how CMI implies CMS, for expositional purposes, we consider model (6) with (Remark 5 is still valid for the case where . But for convenience, we only consider the single treatment situation), and we assume that . For Assumption 5 (ii), we take . Because is a continuous and measurable function of , hence also satisfies Assumption 5 (ii). For Assumption 5 (i), note that

for some , , hence one can verify that for a = 1 and , where and . By the proof of Theorem 3, we can get for some a(t), , and . Substitute into Assumption 2 (i), we will then get

After simplifications, by Assumption 5 (i),

if and only if the following balance conditions are satisfied:

Simultaneously considering Theorem 3 and the above balance conditions, CMI implies CMS if and only if

where and ; and are defined in the proof of Theorem 3 (see the online Supplementary Material S2 File), , . The above simultaneous functions are equivalent to

where

Under this scenario one can verify that, by some elementary but tedious computations, , hence can be uniquely point identified and estimated by . This implies that CMI is compatible with CMS, there is an equivalent relationship between CMI and CMS.

Inference with long range dependency

We consider a bootstrap procedure. For the estimation Step 3.1, consider divide the whole time interval into sub-intervals around tS randomly, where and for . On each interval, we can get by Step 3.1. Thereby, by Theorem 2, the bootstrap sequence should be an i.i.d. stationary process with . The parameter of interest is . By [65] and [66], the limit behavior of the partial sum depends on the Hermite rank of the function G, where and . Let , , k ≥ 0 denote the k-th Hermite polynomial. The Hermite rank q of G is then defined as , . Suppose that for 0 < H < q−1 and is a slowly varying function at infinity [67], i.e., for all . Set , as we have

(18)

where , . converges in distribution to a multiple Wiener-Itô integral with respect to the random spectral measure W of Gaussian white-noise processes [46,66,68].

For q = 1, has a normal limit distribution with mean zero and variance ; and for q ≥ 2, the limit distribution is non-normal. Herein, in this paper we chose to reconstruct the normalizing factor dB so as to get a more straightforward limit distribution. As noted by [68,69], the wrong choice of dB will make the distribution of the statistics converge to a degenerate limit because tends to zero so fast. An appropriate construction could be , then we will get

THEOREM 4. (Asymptotic of the bootstrap estimator) if q ≥ 1 and , then

where , is the cumulative distribution function of the normal distribution with mean zero and variance 1, PB is the probability measure defined on the bootstrap sequence. Furthermore,

if and only if q = 1.

Usually for model (6), hence for most of the DGPs, thereby the Hermite rank can be taken as . On this basis, we propose to estimate the Hurst parameter H by using the change-of-frequency (COF) estimator based on the second-order difference of ([70,71]):

where is the base-2 logarithm. The critical values and confidence interval can be then constructed correspondingly by [72]. The proposed bootstrap procedure can be then applied similarly to the estimation Step 3.2, i.e., we divide the whole time interval into sub-intervals around tD randomly, then repeat Step 3.1 and Step 3.2 to get the bootstrap sequence , the inference framework thereby follows. Be cautious that, our bootstrap inference framework for accounts for the estimation uncertainty brought by the first estimation Step 3.1, because is a smooth function of as shown in Step 3.2.

Heterogeneous treatments effects and panel settings with staggered treatments adoptions

A forward point-wise estimation approach of the heterogeneous treatments effects

Consider now the following one-way heterogeneous treatments effects model (the coefficients are time-varying):

(19)

where , . All variables and parameters follow previous definitions. The parameters of interests are for and for .

STEP 4.1. (Estimation of for ) We estimate the following model respectively for the forward point-wise step :

(20)

where , and . Estimate model (20) by Step 3.1 for times, we will get point-wise estimators respectively for the point-wise step h.

STEP 4.2. (Estimation of for ) Further consider the following model respectively for the forward point-wise step :

(21)

where

is the total treatment indicator, is the total treatment effect, . In accordance with Step 4.1, estimate model (21) by Step 3.1 for T − tD times, we will get point-wise estimators for the point-wise step h respectively.

STEP 4.3. (Estimation of and for ) Combining Step 4.1 and 4.2 to get , consider the following forward one step-wise regression model

where , , . Estimate this model by Step 3.1 for T − tD times, we will finally get the estimators

and

respectively for the point-wise step h.

Let , be i.i.d. random variables and be the corresponding sigma-filtration. For model (19) with , and defined in Remark 4, we assume that , , and , and are measurable functions such that and are well defined for each t and (see [7375] for further definitions). The variance-covariance matrix of these random processes can be denoted as:

We now construct the asymptotic behavior of the point-wise estimator , the asymptotic behavior of can be similarly constructed, hence omitted here for brevity.

Theorem 5. (Asymptotics of the point-wise estimator) Assume that the functions , , , are stochastically Lipschitz continuous processes, and the smallest eigenvalues of the matrices , , and are bounded away from zero. Define respectively for , then we have

and

where

is the Hurst parameter,

and

for , and . , , are Gaussian processes with covariance

and

Theorem 5 states that the asymptotic distribution of the step-wise estimator is a fractional Brownian motion (fBm) process; when H = 0.5 the asymptotic distribution turns out to be the usual Brownian motion. Inference for the step-wise estimator is as the same as the proposed bootstrap procedure.

Panel settings with staggered treatments adoptions and two-way heterogeneities

We now extend model (19) to allow for two-way heterogeneities. Consider the setting with N units and each unit i receives one (several) policy(s) or treatment(s), the start times and end times of the treatments adoptions may, but need not, vary by unit. For the balanced staggered treatments adoptions, the whole time interval can be divided into sub-intervals: , , , ..., , where denotes the s-th sub-interval, ts is the initial time of the s-th sub-interval; and , is the length of the s-th sub-interval.

We assume that, on the s-th sub-interval, each unit i receives a treatment , where . Then denotes the treatments indicator for all units on the s-th sub-interval. Correspondingly, denotes the two-way heterogeneous treatments effects for all units on the s-th sub-interval with , represents the treatment effect for unit i at time t. A graphical and intuitive illustration of these definitions is given in the Online Supplementary Material S3 File.

Definition 6 (Phase states and phase transitions). The sub-intervals are called phase states, the switch from phase state s to s + 1 is called a phase transition. A phase state corresponds to a treatment state, the switch from one treatment state to another for the same unit i is called a heterogeneous phase transition if and only if , otherwise it is a homogeneous phase transition.

The DGP of panel settings with staggered treatments adoptions we consider now is

(22)

where denotes the horizontal stacking of matrices, i.e., representing the stacking of matrices A and B, ∘ denotes the Hadamard product, is the outcomes of all units on the whole time interval, is the outcomes of all units on the s-th sub-interval, and is the outcome of unit i on the s-th sub-interval, is the total length of the sub-intervals; are unobservable (or omitted) confounders driving and for with representing the confounder for unit i on the s-th sub-interval, the coefficients are with ; are observed exogenous control variables, which is similarly structured as , the structure of is similar to ; is unit fixed effect and is time fixed effect. We assume that: , and as , is the Hurst parameter, , . Note that Model (1) is a special case of model (22) with . The parameter of interest are the two-way heterogeneous treatments effects on every phase state .

The non-matrix form of (22) with forward point-wise step h is:

(23)

is the first time unit i receives treatment, ; and . For a given h, consider the step-wise first-order difference of model (23):

where , . Suppose that for all i, as we will get

(24)

with .

Assumption 7 For the Proximal Variables : (i) the random variables and are of high correlations, and are also of high correlation; (ii) is independent of ; (iii) there exists a continuous function , p ≥ 1, such that:

with , is defined similarly to model (7), .

In line with models (7), (8), (11) and (17), model (24) are then turned into varying coefficients panel forms. Estimate this model by varying coefficient fixed effects methods (see, e.g., [7678]), and by the estimation Steps 3.1–3.2, Steps 4.1–4.3 and Assumption 7, we will get consistent estimators respectively for and by applying Theorem 2 and Theorem 5. The bootstrap inference method proposed can be herein applied. See the Online Supplementary Material S3 File for details.

Numerical examples

Monte Carlo simulations

Data Generating Process. To evaluate the performances of our method, we generate potential outcomes from a varying coefficients linear model with staggered treatments:

(25)

where , , , , and , , , , , where U denotes the uniform distribution, denotes the -th quantile; , , . [A] denotes the order operator of the vector A, i.e., where for ; is a discrete-time fractional Brownian motion (fBm) with the Hurst parameter H:

denotes the sampling interval, , and are given parameters, denotes a zero-mean Gaussian process with the covariance function , [79]. is the discrete-time representation of the continuous-time fractional Ornstein-Uhlenbeck process [80].

Model (25) is endogeneous because Xit is correlated with Dit, and we assume that Uit is unobservable. We take by Remark 1. The method based on fast Fourier transformation (FFT) to generate is provided in [81] and [79]. In this paper we consider H = 0.700, , , , T = 500, N = 5 and , hence L = 1.953125. We set , ai = 0, bi = 1, , , ei = 14, fi = 16, gi = −i, hi = i, and for model (25). The estimands we are interested in are

where is the panel model estimand; is the mean group estimand which averages coefficients obtained from separate time-series regressions for each individual; is the time unit i receives treatment, .

We also compare our method with the literature when untreated units are available. We generate 10 untreated units for each treated unit through model (25), i.e., the j-th untreated outcome for the treated unit i is generated as follows:

which guarantees that is a martingale process for every i and j over the time interval .

Estimation Performances. The simulation results in terms of bias and mean squared error (MSE) are shown in Table 2. It would be interesting to see that the treatment effects could be consistently estimated by our methods. The biases of the proposed method are generally the smallest among all competitors and the biases are decreasing with sample size. The other estimators do not perform well due to the heterogeneities or endogeneities.

thumbnail
Table 2. Simulation results with long range dependency (H = 0.70).

https://doi.org/10.1371/journal.pone.0347847.t002

We now examine the performances of the confidence interval (CI) for the proposed methods. It can be seen from Table 3 that the coverage probabilities (CP) are increasing with sample size, and generally perform good.

thumbnail
Table 3. Empirical coverage probability of the 95% CI for the treatment effect.

https://doi.org/10.1371/journal.pone.0347847.t003

Weak and Invalid CP. To illustrate the consequences if the chosen proximal variable is weak or invalid, we carried out additional Monte Carlo simulations studies. (i) Weak CP is generated by , i.e., we introduce different levels of noise into the CP variable and examine the resulting effects on the estimation outcomes. (ii) Invalid CP is generated by , i.e., we introduce different levels of invalidity into the CP variable and examine the resulting effects on the estimation outcomes. The results are shown in the following Tables 48.

thumbnail
Table 4. Simulation studies when the CP is weak or invalid (T = 30, N = 5; CP = t).

https://doi.org/10.1371/journal.pone.0347847.t004

thumbnail
Table 5. Simulation studies when the CP is weak or invalid (T = 50, N = 5; CP = CF treated).

https://doi.org/10.1371/journal.pone.0347847.t005

thumbnail
Table 6. Simulation studies when the CP is weak or invalid (T = 50, N = 5; CP = CF untreated).

https://doi.org/10.1371/journal.pone.0347847.t006

thumbnail
Table 7. Simulation studies when the CP is weak or invalid (T = 50, N = 5; CP = CF treated).

https://doi.org/10.1371/journal.pone.0347847.t007

thumbnail
Table 8. Simulation studies when the CP is weak or invalid (T = 50, N = 5; CP = CF untreated).

https://doi.org/10.1371/journal.pone.0347847.t008

Interestingly, we find that the bias in estimating the treatment effect is increasing in the invalidity of the CP variable, but not in its weakness. That is, our proposed methods remains largely unbiased even when the CP variable is weak (as shown in Table 4). To understand this phenomenon, a theoretical analysis proving that our estimators are robust to weak CP variable is provided in the Online Supplementary Material S5 File.

Short panels, and simple serial correlation. In the paper, we employ the Change-of-Frequency (COF) estimator to estimate the Hurst parameter H. This estimator is based on the second differences of the bootstrap sequence, and the consistent convergence of the Hurst parameter does not depend on T, but rather on B. In practice, the Hurst parameter is only used to construct the normalization factor . As long as the bootstrap sample size B is sufficiently large, the confidence intervals computed from short time sequences can still maintain the nominal coverage level. To see this, we have conducted simulation studies specifically designed for short time sequences.

From Table 9, we find that even for short panel data, the treatment effect estimator remains consistent. At the same time, we observe from Fig 6 that the bias of the Hurst parameter estimator decreases as the number of bootstrap replications increases, indicating that the proposed COF estimator is feasible. Hence, the estimators perform well even if the time series is short.

thumbnail
Fig 6. Simulation studies for the COF estimator.

The figure shows bias of Hurst estimator vs. bootstrap replications with different level of long range dependency.

https://doi.org/10.1371/journal.pone.0347847.g006

Apart from these, we carried out additional simulation studies to illustrate our methods’ performances with simple serial correlation, i.e., the model’s error term is generated by an AR(1) model with auto-correlation coefficient .

The robustness of the proposed bootstrap inference framework to both short and long memory shown in Table 10 stems from its sub-interval resampling design, which non-parametrically captures the underlying dependence structure without imposing parametric assumptions. For short memory (e.g., AR(1) with exponentially decaying auto-correlations), the randomly partitioned sub-intervals Ib are asymptotically independent as the sample size T grows. Consequently, the bootstrap sequence becomes approximately i.i.d., rendering conventional inference valid. For long-range dependence (LRD) where auto-correlations decay hyperbolically, the bootstrap sequence itself retains the long memory property, with its covariance structure satisfying . Rather than ignoring this dependence, the proposed method directly characterizes the limiting distribution of the partial sum . As established in Theorem 4, after appropriate normalization, the statistic converges in distribution to a standard normal with the Hermite rank q = 1. This unified framework automatically adapts to the dependence strength: under short memory, the Hurst parameter H plays no distorting role, and under long memory, it is explicitly incorporated into the normalization factor dB, ensuring valid coverage regardless of the true memory process.

thumbnail
Table 10. Simulation studies with simple serial correlation.

https://doi.org/10.1371/journal.pone.0347847.t010

Unilateral divorce law reforms and divorce growth

We illustrate our proposed methods by analyzing the heterogeneous treatment effects of no-fault divorce law reforms on divorce growth in China. Unilateral (or no-fault) divorce reform carried out worldwide allows either spouse to end a marriage, redistributing property rights and bargaining power relative to fault-based divorce regimes [85,86][85] . The Marriage Law of the People’s Republic of China (promulgated in 2003 and implemented in 2004) stipulates that the condition for divorce is emotional breakdown rather than one party’s fault, so both spouses can file a divorce lawsuit even if both spouses are not at fault (Before The Marriage Law of the People’s Republic of China (2004), the condition for one spouse to file a divorce lawsuit in China is that the other party is at fault, i.e.,bigamy, cohabitation with others, domestic violence, maltreat or abandon. Many divorce lawsuits cannot be supported by the court because one spouse cannot provide sufficient evidence to prove the other’s fault. For more details on Chinese divorce system). What’s special for China is that the implement of the law is “one size fits all”, which means that all 34 provincial districts across the country are all treated units. Hence we are unable to collect untreated units for policy evaluation. The method proposed in this paper is therefore most suitable.

All 30 provinces’ registered divorce data per year from 1990 to 2010 are collected from the National Bureau of Statistics (The data of Chongqing etc. are not provided by the official statistics. Please see the website of the National Bureau of Statistics for reasons: https://data.stats.gov.cn/english/easyquery.htm?cn=E0103). The divorces all show an upward growth trend, hence we take for all i; we take Zit to be the GDP per capita, because GDP per capita is found to be highly correlated with factors driving the growth of divorce [85]. Dit takes value 1 after the year 2003 and 0 otherwise.

As shown in Definition 5 and Theorem 1, if there exist structural change Sit in divorce before the implementation of the law Dit, treatment effect could be contaminated with structure change effect. Hence, structure change effect should be removed when evaluating the law enforcement effect. Sit is then detected by the Chow test through the R package “sctest” using data ranging from 1990 to 2002. The structure changes and treatments statuses are shown in Fig 7(a). Fig 7(b) further shows that staggering effect exist in structure changes [16].

thumbnail
Fig 7. Structure changes and treatments statuses (left), and Goodman-Bacon decomposition (right).

Treatment level 1 indicates the periods when structure changes commence and persist, level 2 indicates the periods when treatment starts and continues, as well as the periods when structure changes last after level 1.

https://doi.org/10.1371/journal.pone.0347847.g007

The parameters we are interested in are the average treatment effect on the treated, the average structure change effect and the total average effect:

in which , , , and is the start time of the structure change for province i as shown in Fig 7(a). The estimation results are collected in Table 11.

thumbnail
Table 11. Unilateral divorce law reform effect on divorce growth.

https://doi.org/10.1371/journal.pone.0347847.t011

It is shown that the structure changes around 1996–1999 lead to a decline of divorce (33240 couples on average), while the implements of unilateral divorce reform since 2003 lead to an increase of divorce (44780 couples on average). TWFE and OLS underestimates the treatment effects and structure change effects for neglecting the heterogeneities and endogeneities caused by two-way variability and potential confounders.

The 1997 Asian financial crisis exerted great negative economic impacts on China, many people lost their jobs or experienced a decrease in asset and income; meanwhile, a major flood occurred across China in 1998, resulting in exceeding 20 billion USD direct economic losses. Under these scenarios, couples will choose to overcome difficulties together, resulting in a decrease in divorce [85]. On the other hand, the unilateral divorce reform lowers the threshold and cost of divorce, making it easier to divorce, hence resulting in an increase of divorce. This finding is in accordance with the literature [86,87].

We also report the point-wise estimations of the ASCs and ATTs in Fig 8 (left: ). It can be seen that neglecting the treatment Dit (Fig 8(a)) or structure changes Sit (Fig 8(b)) will lead to contamination bias [20], and the bias will lead to wrong empirical conclusions. This warns us be aware of other potential distributional shifts if our focus is the effect of some specific social policy treatment.

Conclusion

We propose a novel method to disentangle one treatment effect from another in time-series natural experiment settings with staggered and endogenous treatments adoptions, we allow the social-economic outcome to be long range dependent and no untreated units along with IVs are available. The Conditional Mean Symmetry condition constructed by the proposed Common Proximal Variable enables us to identify the treatments effects under the Neyman-Rubin counterfactual framework, while a two-step point-wise estimation approach allows us to consistently estimate the heterogeneous treatments effects. We show that the asymptotic distribution of the estimator is a fractional Brownian motion process with long range dependency, thereby a bootstrap procedure is considered for inference. We provide a new identification and estimation framework for empirical researchers and decision-makers whose interest lies in evaluating social policies effects.

Supporting information

S2 File.

Main proofs of Proposition and Theorems.

https://doi.org/10.1371/journal.pone.0347847.s002

(PDF)

S3 File.

Detailed estimation procedure for panel model.

https://doi.org/10.1371/journal.pone.0347847.s003

(PDF)

S4 File.

Illustration of panel settings with staggered treatments adoptions.

https://doi.org/10.1371/journal.pone.0347847.s004

(PDF)

S5 File.

Impact of choosing a weak or invalid CP variable on estimation and inference.

https://doi.org/10.1371/journal.pone.0347847.s005

(PDF)

References

  1. 1. Conant GC, Wolfe KH. Turning a hobby into a job: how duplicated genes find new functions. Nat Rev Genet. 2008;9(12):938–50. pmid:19015656
  2. 2. Easterly W, Rebelo S. Fiscal policy and economic growth. J Monet Econ. 1993;32(3):417–58.
  3. 3. Zagler M, Dürnecker G. Fiscal policy and economic growth. J Econ Surv. 2003;17(3):397–418.
  4. 4. Ke X, Hsiao C. Economic impact of the most drastic lockdown during COVID-19 pandemic-The experience of Hubei, China. J Appl Econ (Chichester Engl). 2022;37(1):187–209. pmid:34518735
  5. 5. Angrist J, Pischke J. Mostly Harmless Econometrics: An Empiricist’s Companion. Princeton University Press. 2009.
  6. 6. Bound J, Jaeger D, Baker R. Problems with instrumental variables estimation when the correlation between the instruments and the endogenous explanatory variable is weak. J Am Stat Assoc. 1995;90(430):443–50.
  7. 7. Young A. Consistency without inference: Instrumental variables in practical application. Eur Econ Rev. 2022;147:104–12.
  8. 8. Cinelli C, Hazlett C. An omitted variable bias framework for sensitivity analysis of instrumental variables. Biometrika. 2025;112(2).
  9. 9. Hahn J. On the Role of the Propensity Score in Efficient Semiparametric Estimation of Average Treatment Effects. Econometrica. 1998;66(2):315.
  10. 10. Chernozhukov V, Wuthrich K, Zhu Y. An exact and robust conformal inference method for counterfactual and synthetic controls. J Am Stat Assoc. 2021;116(536):1849–64.
  11. 11. Roth J, Sant’Anna P, Bilinski A, Poe J. What’s trending in difference-in-differences? A synthesis of the recent econometrics literature. J Econom. 2023;235(2):2218–44.
  12. 12. Borusyak K, Jaravel X. Revisiting event study designs. In: 2017.
  13. 13. Imai K, Kim I. When should we use unit fixed effects regression models for causal inference with longitudinal data?. Am J Polit Sci. 2019;63(2):467–90.
  14. 14. de Chaisemartin C, D’Haultfœuille X. Two-way fixed effects estimators with heterogeneous treatment effects. Am Econ Rev. 2020;110(9):2964–96.
  15. 15. Sun L, Abraham S. Estimating dynamic treatment effects in event studies with heterogeneous treatment effects. J Econom. 2021;225(2):175–99.
  16. 16. Goodman-Bacon A. Difference-in-differences with variation in treatment timing. J Econom. 2021;225(2):254–77.
  17. 17. Athey S, Imbens G. Design-based analysis in difference-in-differences settings with staggered adoption. J Econom. 2022;226(1):62–79.
  18. 18. Callaway B, Sant’Anna P. Difference-in-differences with multiple time periods. J Econom. 2021;225(2):200–30.
  19. 19. Roth J, Sant’Anna PHC. When Is Parallel Trends Sensitive to Functional Form?. ECTA. 2023;91(2):737–47.
  20. 20. de Chaisemartin C, D’Haultfœuille X. Two-way fixed effects and differences-in-differences estimators with several treatments. J Econom. 2023;236(2):480–500.
  21. 21. Qiu Y, Chen X, Shi W. Impacts of social and economic factors on the transmission of coronavirus disease 2019 (COVID-19) in China. J Popul Econ. 2020;33(4):1127–72. pmid:32395017
  22. 22. Masini R, Medeiros MC. Counterfactual analysis and inference with nonstationary data. J Bus Econ Stat. 2022;40(1):227–39.
  23. 23. Abadie A. Using synthetic controls: feasibility, data requirements, and methodological aspects. J Econ Lit. 2021;59(2):391–425.
  24. 24. Shi Z, Huang J. Forward-selected panel data approach for program evaluation. J Econom. 2023;234:512–35.
  25. 25. Arkhangelsky D, Imbens G. Causal models for longitudinal and panel data. CEMFI. 2023.
  26. 26. Manski CF. Identification of binary response models. J Am Stat Assoc. 1988;83(403):729–38.
  27. 27. Powell JL. Elsevier. 1994. p. 2443–521.
  28. 28. Heckman J, Ichimura H, Todd P. Matching as an econometric evaluation estimator: evidence from evaluating a job training programme. Rev Econ Stud. 1997;64(4):605–54.
  29. 29. Hirano K, Imbens GW. Estimation of Causal Effects using Propensity Score Weighting: An Application to Data on Right Heart Catheterization. Health Services & Outcomes Research Methodology. 2001;2(3–4):259–78.
  30. 30. Hirano K, Imbens GW, Ridder G. Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score. Econometrica. 2003;71(4):1161–89.
  31. 31. Masten M, Poirier A. Choosing exogeneity assumptions in potential outcome models. Econom J. 2023;26(3):327–49.
  32. 32. White H. Time-series estimation of the effects of natural experiments. J Econom. 2006;135(1):527–66.
  33. 33. White H, Lu X. Granger causality and dynamic structural systems. J Financ Econom. 2010;8(2):193–243.
  34. 34. Lu X, Su L, White H. Granger Causality And Structural Causality In Cross-section And Panel Data. Econom Theory. 2016;33(2):263–91.
  35. 35. Lechner M. The relation of different concepts of causality used in time series and microeconometrics. Econom Rev. 2011;30(1):109–27.
  36. 36. Li K, Bell D. Estimation of average treatment effects with panel data: asymptotic theory and implementation. J Econom. 2017;197(1):65–75.
  37. 37. Abadie A, Cattaneo MD. Introduction to the special section on synthetic control methods. J Am Stat Assoc. 2021;116(536):1713–5.
  38. 38. Angrist JD, Kuersteiner G. Causal effects of monetary shocks: Semiparametric conditional independence tests with a multinomial propensity score. Rev Econ Stat. 2011;93(3):725–47.
  39. 39. Angrist JD, Jordà O, Kuersteiner G. Semiparametric estimates of monetary policy effects: String theory revisited. J Bus Econ Stat. 2018;36(3):371–87.
  40. 40. Bojinov I, Shephard N. Time series experiments and causal estimands: Exact randomization tests and trading. J Am Stat Assoc. 2019;114(528):1665–82.
  41. 41. Koralov L, Sinai YG. Theory of Probability and Random Processes. 2 ed. Springer. 2007.
  42. 42. Hudgens MG, Halloran ME. Toward Causal Inference With Interference. J Am Stat Assoc. 2008;103(482):832–42. pmid:19081744
  43. 43. Forastiere L, Airoldi EM, Mealli F. Identification and estimation of treatment and interference effects in observational studies on networks. J Am Stat Assoc. 2021;116(534):901–18.
  44. 44. Rubin DB. Bayesian Inference for Causal Effects: The Role of Randomization. Ann Statist. 1978;6(1).
  45. 45. Holland P. Statistics and causal inference. J Am Stat Assoc. 1986;81(396):945–60.
  46. 46. Jacod J, Shiryaev AN. Limit theorems for stochastic processes. 2nd ed. Springer. 2003.
  47. 47. Revuz D, Yor M. Continuous martingales and brownian motion. Springer. 2013.
  48. 48. Samorodnitsky G. Long memory and self-similar processes. Annales de la Faculté des sciences de Toulouse : Mathématiques. 2009;15(1):107–23.
  49. 49. Rubin D. Estimating causal effects of treatments in randomized and nonrandomized studies. J Educ Psychol. 1974;66(5):688–701.
  50. 50. Heckman J, Ichimura H, Smith J, Todd P. Characterizing Selection Bias Using Experimental Data. Econometrica. 1998;66(5):1017.
  51. 51. Granger CWJ. Investigating Causal Relations by Econometric Models and Cross-spectral Methods. Econometrica. 1969;37(3):424.
  52. 52. Sims CA. Macroeconomics and Reality. Econometrica. 1980;48(1):1.
  53. 53. Menchetti F, Bojinov I. Estimating the effectiveness of permanent price reductions for competing products using multivariate Bayesian structural time series models. Ann Appl Stat. 2022;16(1).
  54. 54. Menchetti F, Cipollini F, Mealli F. Combining counterfactual outcomes and ARIMA models for policy evaluation. The Econometrics Journal. 2022;26(1):1–24.
  55. 55. Fernández-Val I, Freeman H, Weidner M. Low-rank approximations of nonseparable panel models. Econom J. 2021;24(2):C40–77.
  56. 56. Ishihara T. Identification and estimation of time-varying nonseparable panel data models without stayers. J Econom. 2019;215(1):184–208.
  57. 57. Delgado MA, Escanciano JC. Nonparametric tests for conditional symmetry in dynamic models. J Econom. 2007;141:652–82.
  58. 58. Alonso C, Gutierrez J, Recio T. A note on separated factors of separated polynomials. Journal of Pure and Applied Algebra. 1997;121(3):217–22.
  59. 59. Alonso C, Gutierrez J, Recio T. A rational function decomposition algorithm by near-separated polynomials. J Symb Comput. 2007;19(6):527–44.
  60. 60. Li T. Indirect inference in structural econometric models. Journal of Econometrics. 2010;157(1):120–8.
  61. 61. Robinson PM. Root-N-Consistent Semiparametric Regression. Econometrica. 1988;56(4):931.
  62. 62. Manski CF. Identification of Binary Response Models. Journal of the American Statistical Association. 1988;83(403):729–38.
  63. 63. Chamberlain G. Efficiency Bounds for Semiparametric Regression. Econometrica. 1992;60(3):567.
  64. 64. Powell JL. Estimation of semiparametric models. Handbook of econometrics. 1994. p. 2443–521.
  65. 65. Taqqu MS. Weak convergence to fractional brownian motion and to the rosenblatt process. Z Wahrscheinlichkeitstheorie verw Gebiete. 1975;31(4):287–302.
  66. 66. Dobrushin RL, Major P. Non-central limit theorems for non-linear functional of Gaussian fields. Z Wahrscheinlichkeitstheorie verw Gebiete. 1979;50(1):27–52.
  67. 67. Resnick S. On the foundations of multivariate heavy-tail analysis. Journal of Applied Probability. 2004;41(A):191–212.
  68. 68. Lahiri S. Resampling Methods for Dependent Data. Springer. 2013.
  69. 69. Lahiri S. Refinements in asymptotic expansions for sums of weakly dependent random vectors. Ann Probab. 1993;21:791–9.
  70. 70. Lang G, Roueff F. Semi-parametric Estimation of the Hölder Exponent of a Stationary Gaussian Process with Minimax Rates. Statistical Inference for Stochastic Processes. 2001;4(3):283–306.
  71. 71. Barndorff-Nielsen OE, Corcuera JM, Podolskij M. Limit Theorems for Functionals of Higher Order Differences of Brownian Semi-Stationary Processes. Springer Proceedings in Mathematics & Statistics. Springer Berlin Heidelberg. 2012. p. 69–96. https://doi.org/10.1007/978-3-642-33549-5_4
  72. 72. Cai Z, Juhl T. The distribution of rolling regression estimators. J Econom. 2022;235(2):1447–63.
  73. 73. Zhou Z, Wu W. Simultaneous inference of linear models with time varying coefficients. J R Stat Soc Ser B. 2010;72:513–31.
  74. 74. Wu W, Zhou Z. Gaussian approximations for non-stationary multiple time series. Stat Sin. 2011;21:1397–413.
  75. 75. Wu W, Zhou Z. Multiscale jump and volatility analysis for high-frequency financial data. J Am Stat Assoc. 2018;113(523):1215–27.
  76. 76. Li Q, Racine J. Nonparametric Econometrics: Theory and Practice. Princeton University Press. 2006.
  77. 77. Su L, Wang X, Jin S. Sieve estimation of time-varying panel data models with latent structures. J Bus Econ Stat. 2019;37(3):334–49.
  78. 78. Su L, Wang X, Yang Z. Sieve estimation of panel data models with cross-section dependence. J Econom. 2023;235(2):1447–63.
  79. 79. Kubilius K, Mishura Y, Ralchenko K. Parameter Estimation in Fractional Diffusion Models. Springer. 2017.
  80. 80. Kroese D, Botev Z, Taimre T, Vaisman R. Data science and machine learning: mathematical and statistical methods. Chapman and Hall/CRC. 2019.
  81. 81. Wang X, Xiao W, Yu J. Modeling and forecasting realized volatility with the fractional Ornstein-Uhlenbeck process. J Econom. 2023;232(2):389–415.
  82. 82. Sant’Anna P, Zhao J. Doubly robust difference-in-differences estimators. J Econom. 2020;219(1):101–22.
  83. 83. Arkhangelsky D, Athey S, Hirshberg DA, Imbens GW, Wager S. Synthetic difference-in-differences. Am Econ Rev. 2021;111(12):4088–118.
  84. 84. Xu Y. Generalized synthetic control method: causal inference with interactive fixed effects models. Polit Anal. 2017;25(1):57–76.
  85. 85. Brien M, Lillard L, Stern S. Cohabitation, marriage, and divorce in a model of match quality. Int Econ Rev. 2006;47(2):451–94.
  86. 86. Stevenson B, Wolfers J. Bargaining in the shadow of the law: Divorce laws and family distress. Q J Econ. 2006;121(1):267–88.
  87. 87. Kim D, Oka T. Divorce law reforms and divorce rates in the USA: An interactive fixed-effects approach. J Appl Econom. 2014;29(2):231–45.