A new data assimilation method for high-dimensional models

In the variational data assimilation (VarDA), the typical way for gradient computation is using the adjoint method. However, the adjoint method has many problems, such as low accuracy, difficult implementation and considerable complexity, for high-dimensional models. To overcome these shortcomings, a new data assimilation method based on dual number automatic differentiation (AD) is proposed. The important advantages of the method lies in that the coding of the tangent-linear/adjoint model is no longer necessary and that the value of the cost function and its corresponding gradient vector can be obtained simultaneously through only one forward computation in dual number space. The numerical simulations for data assimilation are implemented for a typical nonlinear advection equation and a parabolic equation. The results demonstrate that the new method can reconstruct the initial conditions of the high-dimensional nonlinear dynamical system conveniently and accurately. Additionally, the estimated initial values can converge to the true values quickly, even if noise is present in the observations.


Introduction
The accuracy of numerical weather prediction (NWP) depends on exact initial and boundary conditions and perfect prediction models. In recent years, with the development of numerical models, the quality of the initial conditions has become the bottleneck of the accuracy of NWP [1]. To provide accurate and reasonable initial conditions for NWP, data assimilation is an effective method. Data assimilation derives from the objective analysis of initial values, and it has now become a novel technique that effectively uses considerable unconventional sources of information (such as satellite and radar data) [2]. By combining a mathematical model with irregularly distributed observations through a computer program, a coordinated relationship between the data and the model is constructed according to some algorithms, from which the most likely values on regular grid points can be obtained. Therefore, it has minimal errors in the analytical results and can provide initial conditions for prediction models [3]. In addition, data assimilation can not only effectively improve the accuracy of prediction models but also reduce the uncertainty of the initial conditions; therefore, data assimilation occupies an important position in ocean and atmospheric sciences [4]. Currently, the methods of data assimilation are primarily divided into two categories: the first is sequential data assimilation methods [5], such as Kalman filter (KF), extended Kalman filter, ensemble Kalman filter (EnKF) and particle filter. The second is VarDA methods [6][7][8][9][10], including 3D and 4D VarDA methods (3D/4D-Var). The latter is the extension of the former in the time dimension, and its performance is considerably better. In addition, 4D-Var is the most advanced method for ocean and weather prediction, and it has achieved great success in improving the accuracy of numerical predictions. 4D-Var is essentially a large-scale optimization problem, whose constraint conditions are differential equations. When solving the problems, the value of the gradient vector for the cost function needs be provided for optimization algorithms, which can be achieved by the reverse integration of the adjoint model [11,12]. Since 4D-Var needs to solve the adjoint model, the amount of computation is particularly large. To reduce the computation cost, an incremental method, which requires the tangent-linear model, is often used. However, the tangent-linear and adjoint models are essentially a first-order differential mode of the nonlinear physical system equation. For complex models, the development of differential models in a manual way is a difficult task [11,12], and the incremental method does not guarantee the convergence of the results. Moreover, the coding of the adjoint operator is a very complex and heavy task, and the parametrization of the physical process will lead to discontinuity of the cost function.
In fact, the key issue of VarDA to be solved is to calculate the partial derivative of the cost function on control variables (such as the initial value or certain parameters). Generally, there is an implicit nonlinear relationship between the cost function and the unknown parameter, both of which are typically connected by ordinary or partial differential equations. The numerical method that is commonly used to solve partial derivatives is the difference method, but the choice of the difference step is problematic. In particular, for high-dimensional models, the computation cost of the difference method is far greater than that of the adjoint method [12]. Moreover, since the initial values of multiple points are needed, it is easy to generate errors. At present, many data assimilation methods have advantages for low-dimensional nonlinear systems, but for high-dimensional nonlinear systems, their results are not as optimistic. For example, the computation cost is large, the calculation accuracy is not high and even the problems cannot be resolved in some situations. To calculate partial derivatives precisely, Lyness and Moler [13] proposed the complex variable differential method (CVD), whose main idea is to convert the calculation process of derivatives into the calculation process in the field of complex numbers. Cao et al. [14] applied the CVD method to data assimilation problems. A new data assimilation method based on dual number theory was proposed by Cao, Huang et al. [15], and its effectiveness was verified in low-dimensional nonlinear dynamical systems and data assimilation problems of non-differentiable prediction models. However, both aforementioned methods have not been applied to high-dimensional nonlinear dynamical systems and data assimilation problems. Due to the high-dimensional optimization problems whose constraint conditions are nonlinear dynamical prediction models, the gradient of the cost function is difficult to calculate, and thus, a data assimilation method based on dual number AD is proposed. In dual number space, both the function value and the gradient vector value can be accurately obtained by solving differential equations with a numerical solution and calculating the cost function. Therefore, compared to the VarDA method based on the adjoint equation, it has some obvious advantages.
The remainder of this paper is organized as follows. First, the dual number theories and algorithm rules are introduced. Second, a new data assimilation algorithm for high-dimensional numerical models is developed by combining accurate gradient information from dual number AD with the classical optimization algorithm. Third, numerical simulations for data assimilation are implemented for a typical nonlinear advection equation and for a parabolic partial differential equation. Finally, a short conclusion is given.

Introduction of dual numbers
The dual number is defined as: x where x is the real part and x 0 is the dual part. Both x and x 0 are real numbers. The dual label x 0 does not indicate any specific value but rather the nature of ε 6 ¼ 0 and ε n = 0(n > 1) [16,17]. For convenient expression, Eq (1) can be written in the form such thatx ¼< x; εx 0 >. The equal conditions of two dual numbers are that their real and dual parts are all equal. The dual number is 0 when the real part and the dual part are all 0. The module of a dual number is defined as jxj ¼ x, which can be positive or negative. The conjugate number of a dual number x and x 0 are extended to a vector, a dual number can be extended to dual vectors. In some special case of data assimilation, the cost function J(x 0 ) is a scalar and x 0 is a multi-dimensional vector to be estimated; therefore, the dual number is written in the following form: The ε i of each component in the dual label vector ε = (ε 1 , ε 2 , Á Á Á, ε n ) has the same properties as the dual label ε, and ε i ε j = 0.
The dual number concept and its theory have been proposed for more than 100 years, but until the 1980s, dual number theory was applied to robot kinematics, spatial structures and other kinematic and dynamic problems [18,19]. In recent years, dual number theory has opened up a new way for calculating the derivative information accurately and efficiently [20,21].

Dual number automatic differentiation algorithm
Using the properties of the dual number, the algebraic operations in the real number field can be extended to dual number space. Assume that two dual numbers arem ¼ m þ εm 0 and n ¼ n þ εn 0 and that their sum, difference and multiplication will be as follows [21]: In Eq (3), m 0 n 0 ε 2 is omitted, not because m 0 and n 0 are too small but because the square of ε is 0 [17]. The dual number multiplication represented by Eq (3) is exactly equal, and there are no truncation errors. In the same way, all of the following for dual number algorithms are strictly correct, and there are no approximations or assumptions. Next, considering the dual number polynomial operation, Eq (4) provides a polynomial representation in real number space: where p 0 , p 1 , p 2 , Á Á Á, p n are multinomial coefficients. Replacing the real number x with the dual numberx ¼< x; x 0 >¼ x þ x 0 ε in Eq (4) and simultaneously using the rules of dual number addition and multiplication, it is easy to prove the formula as follows: where P (1) represents the first-order derivative of the real polynomial P(x) on a real variable x.
x 0 is the seed number, which can take any value. Clearly, if x 0 = 1, then the dual part is the exact value of the first-order derivative dP(x)/dx. The other basic algebras and new algorithms of standard functions in dual number space are detailed in the literature [21]. For other basic functions, their dual number operation relation can be given by a similar derivation method, and the general two-variable basic function F is as follows: In Eq (6), F x is the partial derivative of the function F on the independent variable x. F y is the partial derivative of the function F on the independent variable y. Eq (6) can be further extended to the case where the independent variable is the dual number vectorx ¼ x þ εx 0 : where x and x 0 2 R n are all multi-dimensional vectors. When the above basic arithmetic operations and functions act on mixed independent variables, such as the dual number < x, x 0 > and real number c, the first step is to rewrite the real number c into the dual number < c, 0 >, and the second is to calculate it according to the above algorithms. The derivative of any function F(x) at point x 0 can be obtained by calculating F(< x 0 , 1 >) at < x 0 , 1 > in dual number space directly, and the result is < F(x 0 ), F 0 (x 0 ) >. Likewise, as for the function F(x), whose independent variable is vector x 2 R n , the directional derivative in direction x 0 2 R n at point x 0 can be derived by calculating function F(< x 0 , x 0 >) in dual number space directly, and the result is < F(x 0 ), rF(x 0 ) Á x 0 >. In conclusion, the derivative value of independent variables can be obtained simultaneously when calculating the function value in dual number space, which achieves the function of automatic differentiation. At the same time, truncation errors will not be introduced because it avoids the difference operation, which results in machine accuracy when it is implemented on a computer.

A new data assimilation method
In the derivative computing method based on dual number automatic differentiation, for any cost function J(x) that takes real variable x as the independent variable, a dual variable is first constructed by taking x as the real part and taking εx 0 (the ε is the dual label and take x 0 = 1) as the dual part. Second, the dual variable is substituted into the cost function. Thus, the real variable function J(x) can be transformed into the dual functionĴ ðxÞ, which takes the dual number x as the independent variable. Finally, the dual functionĴ ðxÞ is expanded as a Taylor series: Due to ε n = 0(n > 1), high-order remainders that are greater than second order on the righthand side of Eq (8) can be omitted, and the real part and the dual part on both sides of the equation are strictly equal. It can be shown as follows: In Eq (9), Re[] and Du[] represent the real part and the dual part of the dual function, respectively. From Eq (9), if we want to calculate the first-order derivative of the cost function on the control variable, we only need to replace the independent variables in the cost function of the real number with the corresponding dual variables. After calculating the value of the function, which is in dual number form, the first-order derivative of the cost function on this independent variable can be obtained by taking its dual part. If there are no other error sources, the accuracy of the first-order derivative for the cost function based on dual number automatic differentiation is not affected by the accuracy of the computer. Therefore, it can be considered that the derivative of the function in Eq (9) is the accurate numerical solution [20,21].
x 0 2 >Þ, the dual functionĴ ðxÞ is expanded as a Taylor series: High-order remainders that are greater than second order on the right-hand side of Eq (10) can be omitted by using the property of ε n = 0(n > 1). At the same time, the value of the seed number is x 0 1 ¼ x 0 2 ¼ 1; thus, the real part and the dual part on both sides of the equation are strictly equal. It can be shown as follows: For the multi-dimensional control vector, an expression that is similar to Eq (11) can be obtained. Dual number differentiation can solve the derivative problem of strongly nonlinear and implicit functions that cannot be solved by the general analytical method, which only requires one forward computation in dual number space. Moreover, for high-dimensional situations, the computation cost and errors are considerably less than in the conventional difference method. After obtaining the cost function gradient of the unknown initial state vector, a suitable descent algorithm is selected to optimize the parameters of each unknown parameter according to the following Eq (12): r iþ1 j ðj ¼ 1; 2; Á Á Á ; nÞ represents the ith iteration step, whose size is determined by the descent algorithm. When the value of i is 0, x i represents the initial guess of the unknown initial state vector. The initial state vector of the high-dimensional nonlinear dynamic prediction model can finally be determined by using Eq (12). The specific process of data assimilation based on dual number automatic differentiation is shown in Fig 1, and the specific processing steps are as follows: Step 1: First, a difference scheme (such as the upwind scheme) is used to achieve the numerical procedure for solving differential equations. Second, based on dual number rules in dual number space, the relevant procedures (including the prediction model, the observation operator, the cost function and so forth) are modified, which can be suitable for the numerical calculation of dual number space.
Step 2: Provide the initial guessû 0 of the initial state vector.
Step 3: Using the provided guess value, the cost function and the gradient vector are calculated by using the automatic differentiation of the dual number theory, and the specific steps are as follows: (3a) For each component of the initial state vector, the seed is 1, and the real number is transformed into the dual number. Moreover, the initial state vector is constructed in the form ofû i ¼ ð< u i 1 ; e 1 >; < u i 2 ; e 2 >; Á Á Á ; < u i n ; e n >Þ, where e j represents the multidimensional unit vector whose jth component is 1.
(3b) The dual number vectorû i is used as the initial condition of the high-dimensional nonlinear dynamic prediction model, and the procedure implemented in the first step is used to perform a forward integration in dual number space. In this way, the evolution track of system statexðtÞ in dual number form will be obtained, where the real part and the dual part ofxðtÞ represent the value of the state vector x(t) and the derivative of x(t) on the initial state vector u at different time steps, respectively.
(3c) The value of the cost function in dual number space will be calculated by usingxðtÞ and observation data y obs (t). Then, the value of the cost function will be calculated according to Eq (11), whose gradient of the initial state vector component is calculated in the same way.
Step 4: Using the gradient information of the cost function and combined with the typical gradient method, the step size r iþ1 j is obtained. According to Eq (12), the initial state vector of each component is iterated, and the new estimated value u i+1 is derived. If the termination conditions (such as reaching the convergence accuracy or maximum iteration number that is set in advance) of the procedure are met, the procedure will stop, and the estimated value of the initial state vector will be derived at the same time. If the termination conditions are not met, using the value of the new initial state vector u i+1 , a new iterative loop will start from the third step. To reduce the computation cost, this study uses the usual convergence criterion: the difference of the cost function gradient norm between two consecutive iterations is less than the pre-specified threshold kr u iJk − kr u i+1Jk < 10 −6 . In such situations, the calculation error is less than the machine error.

Data assimilation for the nonlinear advection equation
To illustrate the usefulness of the new data assimilation method based on dual number automatic differentiation for high-dimensional models, numerical experiments are implemented for the nonlinear advection equation as an example. The nonlinear advection equation [22,23] is shown as follows: In Eq (13), u is a physical quantity, x is the horizontal space, and t is the time. L and T are the space and time range of the nonlinear advection equation, respectively. The second equation is the boundary condition. The specific physical significance of this formula can be found in the literature [22,23]. The standard explicit difference scheme, namely, the upwind scheme, is used to discretize the equations: In the numerical experiment, the upwind scheme is first used to solve the nonlinear advection equation. Second, the numerical procedure for calculating the cost function (shown in Fig 2) is achieved. Third, its procedure is modified (shown in Fig 3), which is based on dual number rules in dual number space. The acquisition process of the observation data is as follows: the initial state vector is u 0 = (u 0,Δx , u 0,2Δx , . . ., u 0,(N−1)Δx , u 0,NΔx ) T . The true initial value is u obs 0;jDx ¼ 3 sin ðjDxp=6Þ (j = 1, 2, . . ., N). The time interval is [0, 0.5], and the time step is Δt = 0.001. The space grid distance is Δx = 1.0. The state value of u in discrete time series is derived by numerical integration, and then Gaussian observation noise N(0, σ o ) are superimposed to form the observation data. The mean value and standard deviation of the noise are 0 and σ 0 , respectively. If the background information is not considered, the cost function in discrete form is defined as follows:Ĵ In Eq (15), N represents the number of observations, and kÁk 2 represents the Euclidean norm. The initial guess of the unknown initial state vector is u 0 = (1.0, 1.0, Á Á Á, 1.0). In the process of the iterative estimation for the unknown initial state vector, to update the unknown vector u every time, the dual number vectorû i ¼ ð< u i;1 ; e 1 >; < u i;2 ; e 2 >; :::; < u i;12 ; e 12 >Þ has to first be constructed, where e j (j = 1, 2, Á Á Á, 12) represents the 12-D unit vector whose jth component is 1, and it is input into the numerical model of the nonlinear advection equation to make a forward (0 ! T) integration. Second, the cost functionĴ ðûÞ is calculated in dual number space. Third, the real part and the dual part are taken as the values of the cost function J and the gradient vector r u J, respectively. Finally, the conjugate gradient algorithm is used to update the value of the initial state vector [14]. To verify the effectiveness of the new method, the state variables from the 4th time step to the 15th time step are selected as the observation data in the numerical experiment.  Table 1. As shown in Table 1, the estimated value is very close to the true value. Theoretically, when the gradient norm of the cost function on control variables is 0, the unknown initial state vector can take the optimal estimate. By analysing the experimental results, the following conclusions are obtained: in the case without background information and only using minimal observation information, the data assimilation method based on dual number automatic differentiation can accurately estimate the initial state of the nonlinear advection equation, and its estimation accuracy reaches O(10 −6 ). Table 1 shows the estimated value of the initial condition, the final value of the cost function and the iteration number for the nonlinear advection equation under different levels of observation noise. As shown in Table 1, when there are no observation errors, the estimation accuracy of the unknown initial states is the highest. Moreover, each initial state quantity can be accurate to the 6th decimal place, which verifies the effectiveness of the data assimilation method based on dual number automatic differentiation in estimating the unknown initial state for high-dimensional nonlinear physical systems. In addition, with increasing observation errors, the accuracy of data assimilation decreases, but the iterative estimation results still converge to the true value. When the standard deviation of the observation errors is σ o = 0.2, the results of data assimilation are still close to the true value, and it can be accurate to the first  decimal place, which demonstrates that the new data assimilation method based on dual number automatic differentiation is capable of removing noise in the observations when working with high-dimensional models. It is noteworthy that Figs 4 and 5 show a spike in the iteration process. This is because the descent algorithm obtains the iteration step according to the gradient information of the cost function every iteration, the iteration step is not a constant. If the iteration step is not appropriate in the iteration process, there may be a spike. In the next iteration, the descent algorithm will adjust the iteration step according to the gradient information of the cost function until the difference of the cost function gradient norm between two consecutive iterations is less than the pre-specified threshold. The spike is due to the descent algorithm and not due to the dual number approach, and it will not influence the final convergence result.

Data assimilation for the parabolic equation
To further illustrate the usefulness of the new data assimilation method for high-dimensional models, this part takes the heat conduction equation as an example. This equation [24] is shown as follows: In Eq (16), u is the state variable of temperature and σ is the heat conduction coefficient, which is a constant. The specific physical significance of this formula can be found in the literature [24]. The discrete heat conduction equation is as follows: where the variable u k j is used to approximate the state variable of the temperature u(kΔt, jΔx) of discrete points. The state vector and the goal of data assimilation are the same as the first experiment, but the value of N is N = 6 in this experiment.
In the numerical experiment, the procedure, the cost function, the acquisition process of the observations and the process of the iterative estimation for the unknown initial state vector are the same as the first experiment. But the true initial value is u obs 0;jDx ¼ sin ðjDxpÞ (j = 1, 2, . . ., N). The value of the heat conduction coefficient σ is 1. The time interval is [0, 0.1], and the time step and the space grid are Δt = 0.001 and Δx = 0.25, respectively. Additionally, the initial guess of the initial state vector is u 0 = (2.0, 2.0, Á Á Á, 2.0). To verify the effectiveness, the state variables from the second time step to the 7th time step are selected as the observation data in the numerical experiment.
Figs 6 and 7 present the experimental results of data assimilation for the heat conduction equation without observation noise. Fig 6a and 6b present the change of the value for unknown initial state quantities u 0,2Δx and u 0,4Δx in the iterative estimation process, respectively. It can be observed from the figures that the iterative value converges to the true value, and the other four points have similar results . Fig 7a and 7b show the change of the cost function J(u) and the gradient norm kr u Jk with iteration number, respectively. It can be seen from the figure that the cost function achieves convergence after the 12th iteration. It is noteworthy that Figs 6 and 7 also show a spike caused by the same reason as the first experiment. The value of the unknown initial state vector u is rounded to the 7th decimal place. The results are shown in Table 2, from which it can be observed that the estimated value is very close to the true value. By analysing the experimental results, the following conclusions are obtained: in the case of no background information and only using minimal observation information, the new method can accurately estimate the initial state of the heat conduction equation, and its estimation accuracy reaches O(10 −7 ), which basically can be used as true values. In addition, with increasing observation errors, the accuracy of data assimilation decreases, but the iterative estimation results still converge to the true value. When the standard deviation of observation errors is σ o = 0.2, the data assimilation results are still close to the true value,which can be accurate to the first decimal place. Therefore, this experiment further verifies the effectiveness and the ability of removing noise in the observations of the new data assimilation method when working with high-dimensional models.

Conclusion
To overcome the shortcomings of the gradient computation for high-dimensional prediction models in variational data assimilation when using the adjoint method, a new data assimilation method based on dual number automatic differentiation is proposed. By using the dual number automatic differentiation, the process of gradient analysis is transformed to the computation of   Data assimilation method for high-dimensional models the cost function in dual number space, which can obtain the value of the gradient vector simply, efficiently and accurately. The important advantages are that the coding of the adjoint model and the reverse integration are no longer necessary and that the values of the cost function and its corresponding gradient vector can be obtained simultaneously by only one forward computation in dual number space. Compared to the direct difference method, the calculation precision is higher, because the new method is not affected by the truncation errors and the cancellation errors. Compared to the adjoint method, the computation cost is less, because the new method not only does not need to develop the first order differential model, but also does not need to reverse integration of the adjoint equation. The results show that the new method can effectively estimate the initial conditions for high-dimensional numerical prediction models and is capable of removing noise. Therefore, the method proposed in this study is a new data assimilation method with strong adaptability. However, because different optimization algorithms, such as Quasi-Newton method and conjugate gradient method (used in this paper), may have a slight influence on running time, iteration number and storage space, we will do more experiments to choose the most efficient optimization algorithms according to different cases in the future. Moreover, we expect that using more advanced optimization algoithms in this method can improve its efficiency. In addition, the data assimilation effects and advantages of this new method aiming at more high-dimensional and practical numerical prediction models will be carried out in the subsequent research work.