Generalized Theorems for Nonlinear State Space Reconstruction

Takens' theorem (1981) shows how lagged variables of a single time series can be used as proxy variables to reconstruct an attractor for an underlying dynamic process. State space reconstruction (SSR) from single time series has been a powerful approach for the analysis of the complex, non-linear systems that appear ubiquitous in the natural and human world. The main shortcoming of these methods is the phenomenological nature of attractor reconstructions. Moreover, applied studies show that these single time series reconstructions can often be improved ad hoc by including multiple dynamically coupled time series in the reconstructions, to provide a more mechanistic model. Here we provide three analytical proofs that add to the growing literature to generalize Takens' work and that demonstrate how multiple time series can be used in attractor reconstructions. These expanded results (Takens' theorem is a special case) apply to a wide variety of natural systems having parallel time series observations for variables believed to be related to the same dynamic manifold. The potential information leverage provided by multiple embeddings created from different combinations of variables (and their lags) can pave the way for new applied techniques to exploit the time-limited, but parallel observations of natural systems, such as coupled ecological systems, geophysical systems, and financial systems. This paper aims to justify and help open this potential growth area for SSR applications in the natural sciences.


Introduction
A growing realization in many natural sciences is that simple idealized notions of linearly decomposable, fixed equilibrium systems often do not accord with reality. Rather, empirical measurements on ecosystems, metabolic systems, financial networks, and the like suggest a more complex, but potentially more information-rich paradigm at work [1][2][3][4][5][6][7][8][9][10][11][12][13][14]. Despite a long history of linear methods development in the engineering sciences, natural systems are generally not well described as sums of independent frequencies that can be sensibly decomposed, analyzed as noninteracting, and reassembled (e.g. Fourier or spectral analysis) in the style of traditional reductionism [15,16]. Rather, quantitative measurements show many systems to be fundamentally nonequilibrium and unstable, in a manner more consistent with nonlinear (state dependent) dynamics occurring on a strange attractor manifold M, where relationships between state variables cannot be studied independently of the overall system state [17][18][19][20][21][22][23][24][25][26][27]. This emergent comprehensive view may help explain why many natural systems, such a those mentioned above, appear so difficult to understand and predict. Mirage correlations are commonplace in nonlinear systems where the manifold may contain trajectories that can temporarily exhibit positive correlations between variables for surprisingly long time periods (and in some regions of the state space) and can subsequently and rapidly exhibit negative correlations or no relationship in other time periods (and other regions of M). This transient property of apparent non-stationarity in correlations is one of the confounding phenomena faced by traditional linear models that require continual refitting and exhibit little or no predictive power.
In this paper, we present two general theorems that addresses the problem of characterizing the coupled dynamics of nonlinear systems using time series observations on a manifold M. A special case of this theorem, attributed originally to Takens [12], provided the first sketch of a mathematical proof for reconstructing a diffeomorphic shadow manifold M' using lags of a single time series as coordinate axes. The basic idea, that was earlier demonstrated by Packard, Crutchfield, Farmer, and Shaw [28] and Crutchfield [2], is that under generic conditions, a shadow manifold M' can be created using time-lagged observations of M based on a single observation function (Cartesian coordinate variable) that is a smooth and smoothly invertible 1 : 1 mapping with M. Subsequently, Sauer, Yorke, and Casdagli [29] provided a definitive proof and an explicit extension of Takens' theorem to fractal sets; their theorems are also more powerful than the original theorem, as they show embeddings are not just generic in the sense of being open and dense in the set of all mappings, but in fact almost every mapping in the sense of prevalence [30] is an embedding (see [30] for an in-depth explanation of the advantages of ''prevalence'' over ''generic''). The theorem was also extended by Stark, Broomhead, Davies, and Huke [31,32] and Stark [33] to include certain classes of stochastic systems. Practical methods for reconstruction have also been explored, particularly to address the presence of noise in real data (e.g. [29,34]). Casdagli et al. [35] give a thorough treatment of such techniques based on transformations of univariate maps, showing how optimal noise reduction can be achieved. These very important prior results all focused on reconstruction from a single time series; however, as proven below, they can be extended to the more practically significant case where multiple observation functions are used to generate M'.
Here we prove the more general case of multivariate embeddings (embeddings using multiple time series and lags thereof), and show how time series information can be leveraged if multiple time series and their lags are used to construct embeddings of M'. These theorems pave the way for more extensive use of state space reconstruction methods in practical applications where long time series may not be available, so that multiple diffeomorphic embeddings may be created in factorial fashion to more fully exploit the coupled non-redundant information that can be extracted from multiple time series (multiple observation functions of dynamics on a manifold) to create predictive shadow manifolds [36]. The use of multiple time series allows the possibility of noise reduction that exceeds the limitations of univariate reconstructions in the presence of noise [35].
The possibility of extending Takens' theorem to allow lags of multiple observation functions was mentioned in Remark 2.9 from [29], but was not explicitly proven. The remark was also restricted to mappings strictly formed from consecutive lags, which is not the only possibility that needs to be considered in the multivariate case. Given the potential importance of multivariate reconstructions, we believe a full proof is required-in particular, one that extends the generalization to non-consecutive lags. We show how Takens' theorem is a special case of our more general Theorem 2 (below) and by following the structure of Takens' original proof we clarify the logic and highlight the restrictions and special cases (non-generic cases) that can arise in its application to real world systems. We then give explicit proof of a stronger version of Remark 2.9 from Sauer et al. [29] that allows non-consecutive lags. This third theorem is stronger than the first two in the sense that it shows embeddings are prevalent and not just generic. For those less familiar, we begin with a brief overview of some basic terms and concepts used in our proofs.

Some Basic Concepts of Embedding Theory
Consider the classic Lorenz attractor [37] shown in Figure 1(a), consisting of trajectories in three-dimensional space that together define a butterfly shaped surface or manifold. For simplicity, a manifold can be thought of as a generalized, n-dimensional surface embedded in some higher dimensional space, where the dimension of the manifold may be fractal (as is the case for the Lorenz attractor). More generally, an embedding is a multivariate transformation of a manifold that resolves all trajectories on the original manifold without crossings. That is, an embedding is globally 1 : 1 in that it resolves all singularities in trajectories that define the manifold (singularities are points on the manifold where trajectories cross so that future paths are not uniquely determined).
An immersion is a local embedding that may not preserve the global topology of a manifold. Rather an immersion preserves the topology of every local neighborhood of the original manifold, so that each point of the tangent space of the immersed manifold has the same dimensionality as the true manifold. Thus, an immersion is a mapping that is 1 : 1 between any given ''piece'' of the true manifold and the immersed manifold. However, this condition does not guarantee that the global topology is preserved. This is illustrated in Figure 1(c), where two different pieces of the original manifold are mapped to the same piece of the immersed manifold, producing an immersion that is not an embedding. Immersions are nonetheless a useful conceptual stepping stone for constructing proofs about embeddings, since all embeddings are necessarily immersions.
The Lorenz attractor, Figure 1(a), provides an excellent example to illustrate both of these concepts. Consider two different multivariate functions that transform the original manifold, W y~( y(t),y(t{t),y(t{2t)) and W z~( z(t),z(t{t),z(t{2t)) where t is a small time lag as in Takens' theorem. Both of these functions map points on the true manifold to points on a shadow manifold, shown in Figures 1(b) and 1(c). Examining these shadow manifolds, it is evident that both are immersions of the Lorenz attractor, because zooming in on a particular piece of either will reveal that the tangent spaces have the same dimensionality as the original. However, only Figure 1(b) is an embedding that successfully reproduces the two lobes of the butterfly. The reconstruction in Figure 1(c), based on lags of the z-coordinate, fails to do so, because the two fixed points of the original attractor have the same z-coordinate; they are mapped to the same point on the shadow manifold, so the two lobes are stacked on top of each other. This singularity is a consequence of a special, non-generic symmetry in the Lorenz system that violates an assumption of Takens' theorem. Figure 1(d) shows an embedding based on lags of both yand z-coordinates and is an example of the generalized mappings addressed in this paper.

Two Theorems in the Style of Takens: The Generic Case
Let M be a compact manifold of dimension m. A dynamical system is a diffeophorism w defining the trajectories or ''flow'' on M for discrete time or a vector field X on M for continuous time. Takens [12] proved generically that given w and M, a smooth observation function y : M?R can be used to construct an embedding of M in 2mz1 dimensions under the transformation W (w,y) : M?R 2mz1 where W (w,y) (x)~Sy(x),y(w(x)),y(w 2 (x)), . . . , y(w 2m (x))T. Here the components Sy(x),y(w(x)),y(w 2 (x)), . . . , y(w 2m (x))T correspond to time-lagged observations of the dynamics on M defined by w. Notice that such mappings involve a single distinct observation function (i.e. a single time series), and represent a small subset in the larger set Y 2mz1 of all possible mappings M?R 2mz1 that could, for example, involve multiple time series and their lags.
Takens explicitly refers only to the unlagged y as an observation function, but in its most general sense an observation function is any y : M?R. Thus, the functions y(w(x)),y(w 2 (x)), . . ., corresponding to the lags of the time series are technically observation functions as well. This bears mention, because in the more general case of mappings W : M?R 2mz1 , the observation functions making up the components of W are not all derived from a single time series, but can be various lags of multiple time-series. To treat these cases, it is necessary to acknowledge that these are all observation functions, and we will refer to distinct time series as ''unlagged'' observation functions.
For a mapping W in the larger set Y 2mz1 of all mappings M?R 2mz1 , consider the case with 2mz1 component functions The question arises whether general multivariate mappings W(x)~(y 1 (x),y 2 (x), . . . ,y 2mz1 (x)) form legitimate embeddings. Here we present two theorems: one that demonstrates that maps created from 2mz1 distinct observation functions are generically embeddings and another that shows that maps created from lags of multiple observation functions are also generically embeddings. Both of these theorems generalize Takens' theorem for which the component functions only involve a single observation function.
It follows from Whitney [38] that generically W [ Y 2mz1 is an embedding. Note, however, that Whitney's work does not apply to the specific subsets of Y 2mz1 involving fixed lagged relationships as discussed by Takens for reconstructing attractor manifolds M for dynamic systems. That is, Whitney's theorem is generic and does not address these specific subsets of Y 2mz1 which have ''measure zero'' (e.g. in the sense of ''shy'' defined in [30]). To tackle this problem, we look to the proof of Takens and see that it can be readily generalized to the other subsets of Y 2mz1 , including the case of generic W [ Y 2mz1 .
Recall that, for a compact manifold, a mapping that is an immersion and injective is also necessarily an embedding. Thus, Takens' general approach was to first show that (i) immersions are dense in the set of mappings fW (w,y) g, then that (ii) there is a dense set of 1 : 1 mappings within this set of immersions. Since the set of embeddings is open in the set of all possible mappings, Takens concludes that mappings in fW (w,y) g are generically embeddings. The critical word here is ''generically,'' meaning there can be exceptions (and as explained in [30], the set of such exceptions doesn't necessarily have zero measure).
To demonstrate both (i) and (ii), Takens argues that even when the property of interest (e.g. the 1 : 1 property) does not hold for  [37] is shown with three shadow manifolds created from lagcoordinate transformations. The typical parameters were used: s~10, r~28, and b~8=3, giving the three coupled equations as _ x x~10(y{x), _ y y~28x{xz{y, and _ z z~xy{(8=3)z. The solution was computed using a fourth order Runge-Kutta method with a time step of dt~0:01, and the time lag used to create the shadow manifolds was t~8dt~0:08. (A) The trajectory shown in the x, y, and z coordinates of the original system reveals a two-lobed manifold. (B) A univariate transformation using time lags of the y-coordinate, W~(y(t),y(t{t),y(t{2t)), preserves this two-lobed structure (and other topological properties), verifying Takens' theorem. (C) A univariate transformation using time lags of the z-coordinate, W~(z(t),z(t{t),z(t{2t)), does not preserve the two-lobed structure. Local neighborhoods of the original attractor are, however, preserved. Thus, though this mapping violates a genericity assumption of the original theorem and is not an embedding, it is an immersion of the original manifold. (D) A multivariate transformation using both the yand z-coordinates, W~(y(t),y(t{t),z(t)), fulfills the assumptions of Theorems 2 and 7. As predicted, it also preserves the two-lobed structure of the Lorenz and is a valid embedding. doi:10.1371/journal.pone.0018295.g001 some particular mapping, by making an arbitrarily small perturbation, it is possible to find a nearby mapping for which that property holds. The key to the theorem and also to adapting it to other sets of mappings is finding how to make these perturbations. The proof is most straightforward for the general case involving 2mz1 distinct observation functions (each a distinct time series) because it is possible to perturb the component functions of W Sy k T independently. Thus we begin with this proof to add clarity to the more powerful main theorem 2 involving lags of multiple observation functions.
Theorem 1. Consider a compact, m-dimensional manifold M and a set of 2mz1 observation functions Sy 1 , . . . ,y 2mz1 T, where y k : M?R smoothly; by ''smooth'' we mean at least C 2 . Then it is a generic property of all possible Sy k T that the mapping W SykT : M?R 2mz1 defined as is an embedding.
Proof. Consider an arbitrary set of 2mz1 observation functions S y y k T on M. We define a corresponding mapping W S y y k T [ Y 2mz1 by letting each of these 2mz1 observation functions be one of the component functions of W S y y k T . Now, recall that an immersion is a map with a derivative that is globally injective, i.e. 1 : 1. We denote the total derivative of a function f as Df . If the derivative is evaluated at a particular point x in the domain of f , we will write (Df ) x , and if Df is a matrix, then we denote the derivative at a particular point and along a particular For any point x [ M, we can perturb the co-vectors (D y y k ) x [ T Ã (M) independently by perturbing individual y y k . By making infinitesimal perturbations at points x [ M for which rank (DW S y y k T ) x vm, we can get a set of observables S y y y y k T arbitrarily close to S y y k T such that rank (DW S y y y y k T ) x~m for all x [ M-i.e., W S y y y y k T is an immersion. Since the set of immersions is open in the set of all mappings, there is a neighborhood U5Y 2mz1 around this W S y y y y k T such that every W Sy k T [ U is an immersion.
Since immersions are local embeddings, we can find a dw0 such that on the manifold, 0vr(x,x')ƒd implies W S y y k T (x)= W S y y k T (x 0 ). Here we depart from Takens' notation and let d denote infinitesimal separations between two points on the manifold M to avoid confusion with the later defined e which is used to perturb the observable; r is any fixed metric on M. In fact for this fixed d, there is a subset U 0 5U such that for any Sy k T in U 0 , the associated map W SykT is an immersion, and r(x,x')ƒd implies that Next, we show that we can find a globally 1 : 1 W Sy k T [ U 0 arbitrarily close to W S y y y y k T . To do this, we construct a finite collection of subsets U i f g N i~1 such that the U i are open subsets of M, the collection covers M, and diameter (U i )vd for every i. Then, we take a partition of unity fl i g corresponding to these U i , so that we can vary the value of any y y y y k by an infinitesimal amount y y y y k ? y y y y k ze ki l i without altering the value of W y y y y k (x) for x 6 [U i .
We now consider the mapping Y : M|M?R 2mz1 |R 2mz1 defined as Y(x,x 0 )~(W y y y y k (x),W y y y y k (x 0 )). We define the set W 5M|M as W~f(x,x 0 ) [ M|Mjr(x,x 0 ) §dg, so that (by our choice of d), the mapping W S y y y y k T is necessarily injective on the complement of W in M|M. Furthermore, note that the intersection of Y(W ) with the diagonal of R 2mz1 |R 2mz1 gives the set of points f(x,x 0 ) [ M|MjW y y y y k (x)~W S y y y y k T (x 0 )g, and therefore Y(W )\D~6 0 is equivalent to W Sy k T injective. Our task, then, is to perturb the manifold Y(W ) using the e ki and e ki' so that it does not intersect the diagonal manifold D.
At each p [ Y(W )\D we know that r(x,x')wd, so x and x 0 cannot belong to the same U i . Consequently, varying an e ki or e ki' only alters the value of W S y y y y k T at either x or x 0 (respectively). In the tangent space T Ã p (R 2mz1 |R 2mz1 ), then, the direction of the (2mz1)z(2mz1) infinitesimal changes given by the e ki and e ki' are all linearly independent (indeed orthogonal) and as such span T Ã p (R 2mz1 |R 2mz1 ). Since the tangent spaces of Y(W ) and D are at most 2m and 2mz1 dimensional, respectively, we can construct a vector from a linear combination of LY Le ki p and LY Le ki' p that lies outside of both T Ã p (Y(W )) and T Ã p (D). Therefore, an infinitesimal perturbation corresponding to this linear combination will move the sub-manifolds Y(W ) and D away from each other at the point p without creating a new intersection at another point. By keeping the size of these perturbations sufficiently small, we ensure that we stay confined to U 0 , so that W SykT is still an immersion. This is a more transparent statement of the transversality argument used in the Takens proof (1981).
Thus, we have shown that for any arbitrary set of 2mz1 observables S y yT, we can find a set of observables Sy k T arbitrarily close to S y y k T such that W Sy k T is an embedding-i.e., there is a dense set of observables fSy k Tg5Y 2mz1 such that W Sy k T is an embedding. The set of embeddings is open in the set of all mappings, so this set is dense and open, meaning that the embedding property is generic over all mappings.
When mappings are confined to fixed lag relationships, Takens showed it is valid to independently perturb each component of W at a given point of the domain by perturbing the unlagged observation function, y, in the other parts of the domain corresponding to neighborhoods of the lagged states w {1 (x), w {2 (x), etc. This ensures that the perturbations to W maintain the structure of the lag relationships and that we have not inadvertently left the subset of interest. As we now show, this allows the above result to be easily extended to families of maps having component functions that are the lags of multiple observation functions. This is the relevant case for many practical examples where lags of multiple time series (multiple variables or observation functions) are required to achieve a mechanistic reconstruction of M (e.g. [20]). It also allows information on M to be leveraged when the time series are short, as is the case in many physical and biological problems [22,36].
Before starting the proof, however, we must clarify exactly what the ''subsets of interest'' are. We define these sets as follows. First, we say y q is a lag of the observable y if we can write y q~w b (y) for positive b. We consider the lags in the positive time direction only to simplify notation in the proof, noting that the results apply equally to negative lags. Let r~fr 1 ,r 2 , . . .g be the subset of k~1, . . . ,2mz1 for which y r , r [ r is an unlagged observable, i.e. y r is not a lag of another y [ Sy k T. We begin with the ''unlagged'' observation functions, y r , or observation functions that are not a lag of another observable in Sy k T. Now define a set C r for each r [ r that contains y r and any other observation function in Sy k T which is a lag of it. That is, C r is the set of y q [ Sy k T that are lags of y r given as y q~w bq (y r ), where the lags b q are distinct for fixed r. This choice of C~fC r : r [ rg and b~fb k : k~1, . . . ,2mz1g determine a subsetỸ Y 2mz1 C,b 5Y 2mz1 containing all choices of 2mz1 observables Sy k T which obey the correct lag relationships under a dynamical system w. Note that each element ofỸ Y 2mz1 C,b can be identified by the dynamical system and the y r . We denote such an element, then, as (w,Sy r T).
Theorem 2. Consider a diffeomorphism w : M?M on some compact manifold M of dimension m, along with 2mz1 observation functions y k : M?R, smoothly; by ''smooth'' we mean at least C 2 . Restrict the y k to have the lag relationships corresponding to a collection of sets C and lags b under the dynamical system w, and impose the following generic [12,39] properties on w: 1. The set A of periodic points with period pvmax(b k ) has finitely many points, 2. The eigenvalues of (Dw b ) x at each x in a compact neighborhood A are distinct and not equal to 1.
Then, for generic Sy k T [Ỹ Y 2mz1 C,b , the mapping described by is an embedding.
Proof. The proof of this theorem closely follows the logic of the previous proof and the original argument of Takens [12]. As noted above, any perturbations to W via its component functions (the set of observables having the desired lag relationships under w prescribed by the C r and the b q ).
Here we must also deal with points of M that are fixed points or periodic under the dynamical system w, i.e. the points for which there exists a b such that w b (x)~x (including the fixed point case, b~1). The above proof shows that the mapping W SykT is generically an immersion because the co-vectors (D y y k ) x [ T Ã (M) can be independently perturbed. This is also true for non-periodic points where there are fixed lag relationships between some observables, as we can perturb y r in the neighborhood of w {bq (x) and thus perturb y q~yr (w bq (x)) without affecting y r in the neighborhood of x.
Note that periodic points x can exist such that the period b or some integer multiple of it, n : b, is the fixed time lag between two observables y q1 ,y q2 [ Sy k T belonging to the same C r . Let V 5M be a compact neighborhood of all such points. For x [ V , the vectors (Dy q1 ) x and (D(y q1 0w n : b )) x cannot necessarily be perturbed independently. Nonetheless, while y q1 (x)~y q1 (w n : b (x)) for such a point, it is not generally true that (Dy q1 ) x~( D(y q1 0w n : b )) x . By assumption, for each x [ V , the eigenvalues of the (Dw b ) x are distinct and not equal to 1. Thus, by the chain rule, it is clear that (Dy q1 ) x and (D(y q1 0w n : b )) x are linearly independent. As noted above, all the other (Dy k ) x can be perturbed independently, so we can find a set of observables S y y k T arbitrarily near Sy k T inỸ Y 2mz1 around this S y y k T for which every set of observables in that neighborhood gives an immersion.
We must also satisfy W (w,SyrT) injective. The proof above relied on the ability to independently perturb the manifold Y(W )5R 2mz1 |R 2mz1 at any point (x,x 0 ) by an infinitesimal amount in any coordinate direction. For a periodic point on M with perioid b and two observables related as y q and y q 0w n : b , it is impossible to independently perturb Y(W ) locally in the coordinate y q (x) or y q (x 0 ), as you also perturb y q (x)0w n : b or y q (x 0 )0w n : b . By assumption, the set V has a finite number of elements. For such a generic w and any set Sy k T [Ỹ Y 2mz1 C,b , any neighborhood of the Sy k T will contain a set of observables S y y k T for which the unlagged observation functions S y y r T take distinct values at each point in V .
We first perturb the y r to find an open neighborhood of observables which give immersions when restricted to the set V . We then further perturb the observables to find within this neighborhood a set of observables S y y k T for which W (w,S y y r T) jV is also injective and therefore an embedding (on V 5M). Since embeddings are dense in the space of all mappings, there is a neighborhood U5Ỹ Y 2mz1 such that for all (w,Sy r T) [ U, the map W (w,SyrT) jV is an embedding.
We now show that we can find a (w,S y y y y r T) [ U such that W (w,S y y y y r T) is an embedding on all of M. We first note that at points x [ M\V , the vectors (D y y k ) x [ T Ã (M) can be perturbed independently, so we can find (w,S y y y yT) [ U for which W (w,S y y y y k T) is an immersion. Because an immersion is a local embedding, there is a d such that for x,x 0 [ M, 0vr(x,x 0 )vd implies that W (w,S y y y y r T) (x)= W (w,S y y y y r T) (x 0 ). Since the set of immersions is open in the set of possible mappings, there is a neighborhood U 0 5U such that for any (w,Sy r T) [ U 0 , the corresponding mapping W (w,SyrT) is an immersion. Thus, for the same d as above, 0vr(x,x 0 )ƒd implies W (w,SyrT) (x)=W (w,SyrT) (x 0 ). Now we need to show that there is a (w,Sy r T) [ U 0 such that W (w,SyrT) is also injective on M. As noted in the first proof, this is equivalent to Y(V )\D~6 0 for the mapping Y : M|M? R 2mz1 |R 2mz1 defined as Y(x,x 0 )~(W (w,SyrT) (x),W (w,SyrT) (x 0 )). If x and x 0 are both in V or r(x,x 0 )ƒd, we already know that W (w,SyrT) (x)=W (w,SyrT) (x 0 ). Thus we restrict ourselves to the set W~fx, x 0 6 [V , and r(x,x 0 )wd, no two of x,fw bq (x)g,x 0 ,fw b (x 0 )g belong to the same U i .
Take a partition of unity fl i g corresponding to this fU i g. Because of the way we constructed the fU i g, we can vary the value of each y y y y k at any point x [ M\V by an infinitesimal amount without altering the value of the other y y y y k in the neighborhood of x. We make this explicit as follows. To perturb the y r , we take y y y y r ? y y y y r ze ri l i for i corresponding to x [ U i . To perturb the other y k (y k~yr 0w bq for some r), we perturb y y y y r ? y y y y r ze ri l i for i corresponding to w {bq (x) [ U i . Consider the 2mz1 perturbations, e ri , which are independent shifts at x in distinct y k . In R 2mz1 |R 2mz1 , we note that each corresponding tangent vector LY Le ri T Ã p (Y(W )) and T Ã p (D), which can be used to perturb Y(W ) away from D. By keeping variations in the e ri sufficiently small, we can find a set of Sy k T such that (w,Sy r T) [ U 0 and Y(x,x')\D~6 0 (where Y now corresponds to the W w,Sy k T map). This pair gives a mapping W (w,Sy k T) that is both an immersion and injective, and thus is an embedding. Because U 0 was an arbitrarily small neighborhood of any point inỸ Y 2mz1 C,b , this means embeddings are dense inỸ Y 2mz1 C,b , and the set of embeddings is open in the set of mappings. Thus, the map W (w,SyrT) given by (w, is generically an embedding.
Just as Takens extends the original result for discrete time to dynamical systems in continuous time, we can extend our result as follows: Corollary 3. Consider a smooth vector field X on some compact manifold M along with 2mz1 observables y k : M?R, smoothly; by ''smooth'' we mean at least C 2 . Define w t as the flow on X . Suppose we restrict the y k to have the lag relationships corresponding to a collection of sets C r and lags b q under the discrete dynamical system w t , where t is a constant. We impose the following generic properties on X :

1.
For points x such that X (x)~0, the eigenvalues of (Dw t ) x are distinct and not equal to 1.

2.
No periodic integral curve of X has integer period ƒ2mz1.
Then, for generic Sy k T [Ỹ Y 2mz1 C,b , the mapping described by is an embedding.
Proof. In this case, w t is a discrete time dynamical system on M satisfying the conditions imposed in the theorem above, and this corollary follows directly.

A Theorem in the Style of Sauer et al.: The Prevalent Case
We now give an explicit proof of Remark 2.9 from [29] using the framework constructed in their original paper, but we extend the language to cover reconstructions using non-consecutive lags (from multiple time series). The proof uses Lemma 4.1, 4.6, and 4.11 from [29] to show that 1 : 1 mappings and immersions are prevalent in the spaceỸ Y 2mz1 C,b , just as Sauer et al. use Lemma 4.6 to prove Theorem 3.3, and Lemmas 4.1 and 4.11 to prove Theorem 3.5. These lemmas are now stated (for the proofs, see their original paper).
Lemma 4. (Originally part 2 of 4.1) Let n and k be positive integers, x 1 , . . . ,x n distinct points in R k , u 1 , . . . ,u n in R, and v 1 , . . . ,v n in R k . Then there exists a polynomial h in k variables of degree at most n such that for i~1, . . . ,n, +h(x i )~v i . Lemma 5. (Originally 4.6) Let A be a compact subset of R k . Let W 0 ,W 1 , . . . ,W t : A?R n be Lipschitz maps. For each integer r §0, let S r be the set of pairs x 1 =x 2 in A for which the n|t matrix ð Þ has rank r, and let d r~l ower boxdim ( S S r ). Define W aW 0 z P t i~1 a i W i : A?R n . If d r vr for all integers r §0, then for a~(a 1 , . . . ,a t ) outside a measure zero subset of R t , the map W a is 1 : 1.
Lemma 6. (Originally 4.11) Let A be a compact subset of a smooth manifold embedding in R k . Let W 0 ,W 1 , . . . ,W t be a set of smooth maps from an open neighborhood U of A to R n . For each positive integer r, let S r be the subset of the unit tangent bundle S(A) such that the n|t matrix has rank r, and let d r~l ower boxdim ( S S r ). Define W aW 0 z P t i~1 a i W i : A?R n . If d r vr for all integers r §0, then for almost every a [ R t , the map W a is an immersion on A.
To apply these lemmas, it is necessary to restrict the dimension of the sets of periodic orbits-that is, the sets A pf x [ A : w p (x)~xg for pvmax(fb [ bg). For the case of consecutive lags, Sauer et al. state sufficient conditions to be boxdim (A p )vp=2. A sufficient condition for non-consecutive lags is a bit more complicated. Define the constants B prñ umber of y q [ C r such that b q~m : pzb q' for at least one b q' and m [ N. Also, define B p~P r B pr . A sufficient condition on the A p is 2 : boxdim(A p )vn{B p .
Theorem 7. Let w be a diffeomorphism on an open subset U of R m , and let A be a compact subset of U, boxdim(A)~d. Let C be a collection of sets and b a set of lag relationships as above, such that n~P r n r w2d. Assume that for every positive integer pƒmax(fb [ bg), the set A p of periodic points of period p satisfies 2 : boxdim(A p )vn{B p , and that for each point of A p , the Jacobian Dw p has distinct eigenvalues. Then, for almost every set of n observation functions fy k g satisfying the given lag relationships, the map is an embedding on A.
Proof. Without loss of generality, assume we have ordered the components of W (w,SyrT) with y r1 and all its lags first, then y r2 and its lags, etc. That is, To show prevalence, we find a suitable probe space (see [29]). The infinite dimensional space for the univariate theorem is the observation functions y : U?R, smoothly. For maps constructed from multiple lags, this becomes the sets of s r~s ize(r) unlagged observation functions. Sauer et al. take the probe space for the univariate theorem to be any set H of polynomials in m variables which include all such polynomials up to degree 2n. It is now necessary to have a set of polynomials for each of the y r . Thus, we take the probe space for this theorem to be the Cartesian product of s r copies of H.
Let Sh 1 , . . . ,h t T be a basis for H. We want to show that for almost all choices of s r |t coefficients a r,t , the map W (w,S y y r T) defined by the observation functions y y r~yr z P t i~1 a r,i h i is an embedding. We first demonstrate that almost every W (w,S y y r T) is 1 : 1, proceeding as in the proof of Theorem 4.3 in [29].
To sensibly apply Lemma 5, we adopt the following convention: think of W (w,S y y r T) as a perturbation of W (w,SyrT) , which is the summed effect of perturbations on each y r separately. For each pair (r,i), r [ r and i [ f1, . . . ,tg, there is a map W r,i : U?R n which is W (w,Sỹ y r' T) forỹ y r'~hi if r~r' and 0 otherwise. The components of W r,i (x) are either 0 or of the form h i (w bq (x)). Consequently, W w,S y y r T~Ww,SyrT z P r P t i~1 a r,i W r,i (x), which matches the structure Lemma 5.
We now check that the rank of the matrix M x1x2 satisfies the conditions of Lemma 5 for each pair of distinct x 1 ,x 2 [ A. Note that to avoid confusion with the previous section of this paper and Takens' original work, we continue to use row vectors to describe the transformations W. However, Sauer et al. [29] prefer column vectors, so it is necessary to use of transposes in several instances. Thus, we have . . .
Note that M x 1 x 2 is a block diagonal matrix, and so it has rank equal to the sum of the rank of the blocks. Each of the s r blocks can be rewritten as the product of two matrices, J r and H r , where the entries of H r are values of a single polynomial h and the entries of J r are each one of f1,0,{1g. Note, there are multiple possible choices for H r and J r that give the same M x 1 x 2 .
Case 1: First consider x 1 and x 2 that do not both lie in a periodic orbit of integer period less than max(b). We specify H r so that the first n r rows, where n r is the size of the set C r , correspond to the h r,i (x 1 ),h r,i (w brz1 (x 1 )), . . . ,h r,i (w brzn r (x 1 )), and the next n r correspond to the h r,i (w bq r (x 2 )). H r is onto, so the rank of M x1x2 is just the sum of the ranks of the J r . For this case, J r contains a copy of I nr , and thus will have rank n r . The entire matrix M x1x2 will thus have rank n~P r n r , which satisfies the conditions of Lemma 5.
Case 2: Now consider x 1 and x 2 in separate periodic orbits with periods p 1 and p 2 such that 1ƒB p1 ƒB p2 and p 1 ,p 2 vmax(b). H r will have B p1r fewer rows corresponding to the b q1~m : p 1 zb q2 for some m [ N (there will also be a reduction in the number of rows associated with B p2 ). In this case, J r will still contain the column space of I (nr{Bp 1 r ) and thus rank(J) § P r n r {B p1r~n {B p1 . Again the H r are onto, and so the rank of M x1x2 is the rank of J.
The dimension of the set S of all pairs x 1 and x 2 is dim(S)~dim(A p1 )zmax(dim(A p2 )). By the conditions placed on the size of the A p , we can conclude that dim(S)vn{B p1 ƒdim(M x1x2 ), and thus that Lemma 5 applies to this case as well.
Case 3: Finally we consider x 1 and x 2 in the same p-periodic orbit, pvmax(b). Now the matrix H r becomes more complicated, since some of the h(z) pertaining to x 2 may be equal to h(z) pertaining to x 2 . Consequently, the J r are no longer guaranteed to contain the column space of the identity. Each J r does contain the column space of an n r {B pr dimensional matrix with 1 along the upper diagonal and a single {1 off the diagonal in each column. Using elementary operations, it is possible to make the first m columns of J r upper diagonal for some integer m §(n r {B pr )=2. Thus, the rank of each J r is at least (n r {B pr )=2 and the entire matrix has rank(J) §(n{B p )=2.
The dimension of the set S of all such x 1 and x 2 is just A p . By the imposed conditions, dim(S)v(n{B p )=2ƒdim(M x1x2 ), and Lemma 5 applies. Now we want show that almost every W (w,S y y r T) is an immersion. We check that the matrix has full rank and thus satisfies the conditions of Lemma 6 for each (x,v) in the tangent bundle S(A). Note that this is a block diagonal matrix with s r blocks, so it is sufficient to show that the columns of the ith block span the subspace R nr i for i~1, . . . ,s r . We consider two cases.
Case 1: Consider first the subset S' of x that are not periodic with period pvmax(b). The entries of each block are of the form +h(w b (x)) T (Dw b ) x (v). Since w is a diffeomorphism and v=0, we know that (Dw b ) x (v)=0. Furthermore, the w b (x) are distinct points. Examining Lemma 4, it is clear that the columns span R nr . The dimension of S' is at most 2d{1, so we may apply Lemma 6.
Case 2: Now consider the subset S' of x that are periodic with period pvmax(b). By the conditions of the theorem, (Dw b1 ) x has distinct eigenvalues from (Dw b2 ) x . Therefore, +h(w b1 (x)) T (Dw b1 ) x (v)=+h(w b2 (x)) T (Dw b2 ) x (v). Furthermore, the relationship depends on h, and again referencing Lemma 4, it is clear that the columns span R nr . The dimension of S' is certainly less than 2d{1, so we can safely apply Lemma 6.
Theorem 7 can be extended to continuous dynamical systems (smooth vector fields on a manifold) by letting the flow w t of X be w in the statement of the theorem.

Discussion
Theorem 1 and the more general result presented in Theorem 2 (and its corollary) were given proofs intended to follow those presented by Takens. The original ''transversality'' argument, however, has been replaced with what we reckon is a simpler and more direct argument. These clarify how perturbations to the observation functions can be constructed and highlight why 2mz1 dimensions are necessary to have mappings that are generically embeddings. Theorem 7 is similar to Theorem 2, but takes advantage of the more powerful framework, built around the notion of prevalence, established by Sauer et al. [29]. It also provides more specific conditions on the periodic orbits than Theorem 2 and thus can be applied to certain non-generic situations that Takens' original framework would exclude. Namely, the set of periodic points need not be finite (as required in Takens' original theorem and Theorem 2), so long as the dimensionality does not exceed the bounds stated in Theorem 7. Theorem 7 is an extension of Remark 2.9 in [29], which we explicitly proved by determining a sufficient restriction for the periodic orbits when the lags composing W aren't necessarily consecutive.
This work also develops a language to describe a wider family of cases for reconstructing state space manifolds from multiple observational time series to encourage wider applicability of SSR in the natural sciences. For example, these results can be extended to another special case of interest for reconstructions using time derivatives [40], when multiple observation functions are available. The argument for this case is analogous to that used by Takens [12] for the case when all the derivatives are from a single observation function. Furthermore, these theorems validate heuristic work using spatial lag reconstructions and mixed spatial and temporal lag reconstructions to study spatially coupled dynamics [41].
More importantly, in terms of future applications, Theorems 2 and 7 set the stage for practical reconstruction of state space manifolds from multiple observation functions. This is significant in answering objections to single variable state space reconstruction (SSR) concerning the excessive phenomenology of lagged-coordinate embeddings [26]. These two theorems provide proof of principle for modeling attempts of nonlinear dynamics in the natural sciences involving multiple time series (e.g. [20]), and lays bare the rather non-restrictive assumptions required in such applications for building mechanistic models from multiple time series variables. Moreover, it gives support to the notion of using multiple embeddings as a potentially efficient way of extracting information from time series data of limited length, but where there are potentially many simultaneous observations of dynamics on the same attractor manifold. By reducing correlations in noise between the reconstructed coordinates, these techniques should allow reconstructions to exceed the limitations placed on univariate methods [35], as heuristic examples have already suggested [20]. The potential information leverage provided by multiple embeddings possible from novel combinations of variables (and their lags) can pave the way for a plethora of new applied techniques to exploit the time-limited, but parallel observations of nature [36]. This paper is intended to complement the existing literature on SSR and help promote this potential growth area in the natural sciences.