An approach to localization for ensemble-based data assimilation

Localization techniques are commonly used in ensemble-based data assimilation (e.g., the Ensemble Kalman Filter (EnKF) method) because of insufficient ensemble samples. They can effectively ameliorate the spurious long-range correlations between the background and observations. However, localization is very expensive when the problem to be solved is of high dimension (say 106 or higher) for assimilating observations simultaneously. To reduce the cost of localization for high-dimension problems, an approach is proposed in this paper, which approximately expands the correlation function of the localization matrix using a limited number of principal eigenvectors so that the Schür product between the localization matrix and a high-dimension covariance matrix is reduced to the sum of a series of Schür products between two simple vectors. These eigenvectors are actually the sine functions with different periods and phases. Numerical experiments show that when the number of principal eigenvectors used reaches 20, the approximate expansion of the correlation function is very close to the exact one in the one-dimensional (1D) and two-dimensional (2D) cases. The new approach is then applied to localization in the EnKF method, and its performance is evaluated in assimilation-cycle experiments with the Lorenz-96 model and single assimilation experiments using a barotropic shallow water model. The results suggest that the approach is feasible in providing comparable assimilation analysis with far less cost.

In covariance inflation algorithms, the prior ensemble state covariance is increased by linearly inflating each scalar component of the state vector before assimilating observations PLOS ONE | https://doi.org/10.1371/journal.pone.0191088 January 19, 2018 1 / 23 a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 [5,9,27]. By reducing the underestimation of the B matrix, covariance inflation plays an important role in preventing filter divergence of EDA. In addition, a proper localization of the estimated B matrix is needed to reduce spurious non-zero long-range correlations in the B matrix and to improve its rank deficiency, which allow ensemble-based assimilation schemes with an ensemble size fewer than 100 members to work properly with realistic atmosphere and ocean models [9,25]. Localization is usually implemented as a Schür product between the ensemble-based B matrix and a correlation matrix of which the elements are calculated according to a correlation function with respect to their coordinates. A commonly-used correlation function is the Quasi-Gaussian compactly supported form proposed by Gaspari and Cohn [28] (referred to simply as the GC localization or the correlation function, hereinafter). However, given that the optimal localization is likely to depend on the ensemble configuration (e.g., ensemble size, observation types), a comprehensive tuning of localization is needed in practice. In order to avoid the challenge of tuning the localization parameters [5,25], adaptive localization functions have been proposed [9,17]. As noted in Fertig et al. [19], spatial localization is still difficult when assimilating satellite observations. To tackle this problem, Fertig et al. [19] updated the state at a given location through assimilating satellite observations that are strongly correlated to the model state there. In addition, Miyoshi and Sato [29] and Campbell et al. [30] explored localization functions for satellite radiances. In Miyoshi and Sato [29], the normalized sensitivity function of satellite sensors was used as the localization weights, whereas in Campbell et al. [30] forward operators performing weighted averages of a large number of state variables were applied. Zhu et al. [24] also proposed a localization scheme to use non-local observations; their basic idea is similar to that of Liu et al. [21].
Despite the differences among various localization approaches, the computational cost of localization algorithms is always an important issue as more and more observations are used. In the following, we provide a simple comparison between the computational costs with and without localization, based on the ensemble Kalman Filter (EnKF) with simultaneous treatment for assimilating observations.
Let m x , m y and n be the number of control variables, the number of observations and the ensemble size, respectively. In the EnKF approach, the forecast error covariance matrix P f is calculated as follows: 8 > > > > > < > > > > > : The gain matrix is then: where H is the observation operator (m y × m x matrix) and R is the observational error covariance (m y × m y matrix). Since n is usually smaller than m x , the sample covariance matrix is rank deficient. A Schür product is then implemented between the covariance matrix and a correlation matrix to increase the rank through removing spurious long-range correlations. Then, the Kalman gain is written as where ρ m x Âm x is a compactly supported correlation matrix in which each column represents spatial correlations at a given model gridpoint. This method is referred to as model spatial localization. Following Houtekamer and Mitchell [13], localization has also been applied in observation space, leading to the following Kalman gain: Since Hb % P y m y Ân ¼ 1 ffiffiffiffiffiffiffiffiffiffiffi n À 1 p y 1 À " y; y 2 À " y; . . . ; y n À " y ð Þ; " where P y is a m y × n matrix, Eqs (2) and (4) can be respectively expressed as and When the initial analysis increment x 0 m x ¼ K m x Âm y y 0obs m y is calculated using Eqs (6) and (7), respectively, their costs are quite different, where y 0obs m y is the m y -dimension observation innovation vector. If we ignore the difference between the costs for calculating: y r m y ¼ ðP y m y Ân ðP y m y Ân Þ T þ R m y Âm y Þ À 1 y 0obs m y and y rl m y ¼ ðρ m y Âm y ½P y m y Ân ðP y m y Ân Þ T þ R m y Âm y Þ À 1 y 0obs m y , which are relatively small, the calculation of the increment with localization using Eq (7) (i.e., x 0 m x ¼ ρ m x Âm y ½b m x Ân ðP y m y Ân Þ T y rl m y ) needs m x × m y × (n + 2) multiplications and m x × m y × n − m x additions, while that without localization using Eq (6) (i.e., x 0 m x ¼ ½b m x Ân ðP y m y Ân Þ T y r m y ) takes only (m x + m y ) × n multiplications and (m x + m y − 1) × n − m x additions.
To provide an intuitive understanding of the huge cost of localization in the EnKF when assimilating multi-source observations, including satellite measurements, let us consider a typical realistic NWP configuration such that m x = 10 7 , m y = 10 5 and n = 30. In this case, the calculation of the increment with localization takes about 3 × 10 13 multiplications and about 3 × 10 13 additions, much more expensive than without localization, which takes about 3 × 10 8 multiplications and 3 × 10 8 additions.
To reduce the huge cost of localization in the EnKF, two kinds of methods are frequently used, including batch processing [2] and serial processing [14]. In batch processing, the observations are organized into batches and each batch is assimilated simultaneously, while in serial processing, observations are assimilated sequentially, one by one. However, even if batch or serial processing is performed in the EnKF, the computational cost is still quite large in practice. For example, the EnKF with serial processing needs m y sequential assimilations and, each time at least m x × n multiplications are required according to the gain matrix formula (see Eq (6)). In total, at least m x × m y ×n multiplications are needed for the serial implementation of the EnKF. This is still a huge cost. As in the previous example, let m x , m y and n be 10 7 , 10 5 and 30, respectively; the EnKF will perform at least 3×10 13 multiplications. Obviously, the cost reduction by serial processing is not significant. The cost of batch processing can be estimated in the same way, and is just as large as the serial processing. The obvious advantage of these two methods is easier to obtain the solutions to y rl m y ¼ ðρ m y Âm y P y m y Ân ðP y m y Ân Þ T þ R m y Âm y Þ À 1 y 0obs m y , which is much less costly than that of x 0 m x ¼ ρ m x Âm y ½b m x Ân ðP y m y Ân Þ T y rl m y .
In this paper, we proceed as follows: first a more detail descriptions of the new scheme are given. Then, we compare its filtering performance with the standard scheme. We also conduct preliminary tests using the new approach in the EnKF. Finally, we discuss our methods, the scope and limitations of this study, and some of the possible extension.

Methodology
The basic idea of covariance localization is to limit the number of observations that can affect the analysis at a particular gridpoint. A simple technique for this is through observation selection, since the analysis is affected by observations within a cutoff radius [2]. Another way to implement covariance localization is to apply a Schür product between the forecast error covariance matrix and a correlation matrix [13], either in model space (Eq (3)) or in observation space (Eq (4)). For example, the element ρ i,j of the correlation matrix ρ m x Âm y can be written as are the horizontal and vertical distances between the i-th control variable and the j-th observation, respectively. C 0 in (8) is the GC correlation function: where ¼ d i;j d 0 , L i and L j are the spatial coordinates of the i-th control variable and the j-th observation, respectively. From Eqs (6) and (7), it is clear that the increase in computational cost due to the localization is mainly caused by the Schür product between b m x Ân ðP y m y Ân Þ T and the correlation matrix ρ m x Âm y , leading to the change of b m x Ân ðp y m y Ân Þ T from a separable form to an inseparable form. This expensive calculation can fully be avoided and can be reduced to a time-saving product between b m x Ân and an n-dimensional vector if the Schür product is not performed. If the localization matrix ρ m x Âm y can possibly be decomposed into a product of two vectors: the aforementioned Schür product may become separable: where ρ x m x and ρ y m y are m x -and m y -dimension vectors, respectively, andb m x Ân andP y m y Ân are: ðρ y m y ðy 1 À " yÞ; ρ y m y ðy 2 À " yÞ; . . . ; ρ y m y ðy n À " yÞÞ : ð12Þ In this way, the high computational cost resulting from the localization can be greatly reduced.
However, this localization matrix cannot be expressed in the form of Eq (10), because it is impossible to decompose the correlation function into the following form: according to its definition by Eq (9). How to decompose the correlation function becomes the key point to reduce high computational cost for localization. Liu et al. [21] used the empirical orthogonal function (EOF) to decompose the correlation function on a low-resolution grid, and then interpolated the chosen dominant modes to the high-resolution grid. This is one of the earliest studies to expand the GC localization function. It is an efficient method that avoids the high cost of conducting the EOF on the high-resolution model grid directly, but it inevitably results in a reduction of accuracy in calculating the correlation function, due to the low precision of the leading modes decomposed on the low-resolution grid and the interpolation from the low-resolution grid to the high-resolution grid. Buehner et al. [31][32], Bishop et al. [33] and Kuhl et al. [34] adopted scaled spherical harmonics to decompose the correlation function. They provided an analytical and continuous expansion of the GC localization function. However, it is difficult to apply these methods to regional assimilations, because the spherical harmonics that requires homogeneous or periodic boundary conditions is more suitable for a spherical domain. Actually, due to the same reason, regional models rarely use the spherical harmonics for discretization of their dynamical cores. To avoid the aforesaid problems, this study tries to find a group of basis functions to expand the correlation function: so that the expansion is applicable for assimilations in both spherical and rectangular domains. The basis function e k (L) in (14) is analytical and subject to orthogonality as follows: where w(x) is a weighting function and Ω is the domain of the model. The coefficient β k , which is the eigenvalue (or variance) of the k-th basis function, can be calculated directly according to the above orthogonality (Eq (15)): (14) is the number of basis functions, which is either infinite when β k is calculated based on Eq (16), or a finite positive integer depending on the given resolution to discretely calculate the coefficient. If the orthogonal basis functions are just the intrinsic modes of the correlation function, a finite number of leading modes can be chosen to express the correlation function approximately as follows: where K 0 is the number of selected leading modes, which should be much smaller than K c . K 0 can be determined according to a given criterion for the contribution of accumulated variance of the chosen leading modes to the total variance: b k (say, 95% or more). In this New localization scheme way, the localization matrix ρ m x Âm y can be simplified to the following form: In Eq (18), ρ ðx;kÞ m x and ρ ðy;kÞ m y are m x -dimension and m y -dimension vectors, respectively. Now, the localization can be reduced into ðρ ðy;kÞ m y ðy 1 À " yÞ; ρ ðy;kÞ m y ðy 2 À " yÞ; . . . ; ρ ðy;kÞ m y ðy n À " yÞÞ : ð20Þ The new localization scheme mainly needs (m x + m y ) × n × K 0 × 2 multiplications and ((m x + m y − 1) × n − m x ) × K 0 additions. Under the same resolutions as mentioned in the introduction (m x = 10 7 , m y = 10 5 , and n = 30), and if K 0 = 20, the multiplication and addition calculations are performed about 1.2×10 10 and 6×10 9 times, respectively, in the new scheme. Its cost is roughly 2500 times lower than usual. In the next subsection, we will give the definition of the basis functions, and then investigate the precision of the expansion in the right side of Eq (17) in 1D and 2D cases.

Basis functions
As discussed above, expanding the correlation function by means of a group of basis functions can greatly reduce the cost of the localization. Therefore, the construction of basis functions is the first step for the expansion. Actually, the eigenvectors of the correlation function on the discrete grid with a prescribed resolution can be used to determine the features of the basis functions and, ultimately, to construct them analytically. For this purpose, the eigenvectors of the discrete correlation function are investigated first in the following. For convenience of discussion, investigations are conducted in the 1D case under periodic and non-periodic boundaries, respectively.
1D case under non-periodic boundary condition. Suppose the domain of definition is [a, b], of which the length is l 0 = b − a. Uniformly partition the interval [a, b] using m grids whose locations are If the filtering radius is d 0 , the non-dimensional distance can be expressed as In this way, the value of the correlation function between the grids can be calculated according to Eq (2): c i,j = C 0 (r i,j ) (i = 1,2,. . ., m; j = 1,2,. . ., m), which, as the elements, forms the localization matrix ρ nonÀ periodic mÂm , a sparse banded matrix. Because this matrix is symmetric, it has m real non-negative eigenvalues σ 1 ! σ 2 ! . . . σ m ! 0 and corresponding unit When m is very large, K 0 leading eigenvectors with the largest eigenvalues can be chosen to approximately expand the localization matrix: This means that the eigenvectors of the localization matrix can be the best choice for the basis functions in the discrete case. Therefore, we are interested in what analytical forms they have.
Set , respectively, which barely differ from each other. Their differences between two adjacent resolutions are much smaller than the eigenvectors themselves in terms of amplitude ( Fig 2B). In particular, these differences become smaller as resolution increases ( Fig 2B). This suggests that sine-function-based eigenvectors are insensitive to grid resolution. On the other hand, the relative change of the extended boundary, defined as ε ¼ lÀ l 0 l 0 , is in a small range [0.066, 0.075] when the resolution varies from m = 101 to m = 801. It indicates that the analytical forms of the eigenvectors can be sine functions with different frequencies approximately so that they can be used as the basis functions: These functions are orthogonal on the extended domain of definition ½ã;b: where wðxÞ ¼ 2 l . Using the above sine functions, the correlation function can be approximately New localization scheme expressed by a truncated expansion: where 1D case under periodic boundary condition. In this case, the domain of definition is supposed to be a zonal circle at any a latitude θ, i.e., the longitude λ varies from a = 0 to b = 2π. Uniformly partition the interval [a, b] using m grids whose locations are λ i = (i−1)×dλ, where dλ = 2π/(m−1); i = 1,2,. . ., m. For any λ i and λ j on the zonal circle, their distance can be defined as their arc length: d i,j = R 0 cosθ×min(|λ i −λ j |,2π−|λ i −λ j |) because of the periodic boundary, where R 0 is the radius of the Earth. When the filtering radius of longitude is λ 0 , the geometrical filtering radius is then d 0 = R 0 cosθ×λ 0 . Consequently, the non-dimensional distance between λ i and λ j is r i,j = d i,j /d 0 = min(|λ i −λ j |,2π−|λ i −λ j |)/λ 0 , with which the localization matrix ρ periodic mÂm calculated according to Eq (9) is still a symmetric matrix, but not a banded matrix. Similarly, it has m real non-negative eigenvalues σ 1 ! σ 2 ! . . . !σ m ! 0 and corresponding unit orthogonal eigenvectors s 1 , s 2 ,. . ., s m , so that As decomposed above, the spatial distributions of the eigenvectors can also approximately be expressed as sine functions with different frequencies and phases. They can be defined with a periodic domain [a, b] as: These functions are orthogonal in the domain of definition [a,b]: where wðlÞ ¼ 2 l . Using the above sine functions, the correlation function can be approximately expressed by a truncated expansion: where

Distance functions
The distance function or the non-dimensional distance function is critical for formation of the localization matrices ρ nonÀ periodic mÂm and ρ periodic mÂm in 1D cases based on Gaspari and Cohn [28]. If a 1D model has equal spacing grids, as supposed in the aforesaid 1D cases under periodic and nonperiodic boundary conditions, the non-dimensional distances can be expressed as functions with respect to the grid number (i). For example, in the 1D periodic case, the non-dimensional distance between two model grid-points can be formulated as r i,j = d i,j /d 0 = min(|i−j|, m−1− |i−j|)/i 0 , according to the expressions λ i = (i−1)×dλ, λ j = (j−1)×dλ and 2π = (m−1)×dλ, where i 0 = d 0 /dλ. Similarly, the non-dimensional distance between two model grid-points in the 1D non-periodic case can be expressed as r i,j = d i,j /d 0 = |i−j|/i 0 , where i 0 = d 0 /dx. For the nondimensional distances between a model grid-point and an observation location, the subscript j may be a real number: j = 1 + λ j /dλ, where λ j is the observation location.
In 2D cases, the non-dimensional distance between two model grid-points can also be expressed as functions of grid number (i, j). If the domain is rectangular with m×n discrete grids, i.e., [a, b; c, d] with the grid-sizes dx = (b−a)/(m−1) and dy = (d−c)/(n−1), respectively, the distance between any two discrete points A(x i A ; y j A ) and B(x i B ; y j B ) in this domain can be , where x i = a+ (i−1)×dx and y i = c+(j−1)×dy. The corresponding non-dimensional distance is then expressed d 0 /dx and j 0 = d 0 /dy. Because the correlation function C 0 (r) has a close relationship with the exponential function [35]: where the constant α > 0, the correlation function is approximately separable in 2D cases: Therefore, we assume that the 2D expansionC K 0 ðrÞ is approximately separable: It suggests that a 2D correlation function can be calculated using two 1D correlation function, which greatly reduces its complexity in calculations. If (i, j) is used to express an observation location, i and j may not be integer numbers, as in the 1D case.
If the 2D domain is the spherical surface with longitude-latitude coordinates: (λ, θ)2[0, 2π; −π/2, π/2], which is widely used in global atmospheric models, the exact distance between two discrete points A(l i A ; y j A ) and B(l i B ; y j B ) in this domain is defined as R 0 cos À 1 ðsiny j A siny j B þ cosy j A cosy j B cosðl i A À l i A ÞÞ. However, this formula of distance may lead to inseparability in calculation of the corresponding correlation function. Due to this reason, the distance function here is approximately defined as the hypotenuse of the curved-edge right triangle consisting of the points A, B and O, where the point O can be O A (l i B ; y j A ) or O B (l i A ; y j B ). It means two right triangles ΔBAO A and ΔABO B share the same hypotenuse. These two triangles have the same meridional leg length but different lengths of zonal leg (AO A and O B In this way, the correlation function for localization in the spherical domain can then be computed using 1D correlation functions according to Eq (34).
To provide an intuitive evaluation on how much the approximation of distance is, we consider a spherical domain with grid-sizes of 4.5˚× 4.5˚, which is used by the spherical barotropic model in section 3.2. One location is selected at the equator, and the other, at a higher latitude. Table 1 gives the exact arc length between two points on a sphere (the arc length AB) and the approximate value calculated by We can see that the longer the distance between two points A and B, the larger the error; and the error at the higher latitude is larger than that near the equator. For example, the error of the distance between point A (90˚N, 120˚E) and point B (72˚N, 138˚E) at high latitudes is about 24 km, so the relative error is no more than 2.4%. Compared with the filtering radius used to define non-dimensional distance (e.g., eight grids used by the spherical barotropic model in next section), the influence of such error is much smaller and negligible. The distance errors at lower latitudes are even smaller.

Preliminary evaluation
Given the analytical basis functions shown in Eq (25), numerical tests are conducted to evaluate the expansions of the correlation function with different truncations through comparison with the original one, in the 1D and 2D cases, respectively. here, we select 49), the original correlation function C 0 (r i ) (black curves in Fig 3) and its expansion C K 0 ðx; x 0 Þ (green curves in Fig 3) with different truncation numbers K 0 are then calculated on the discrete grids. It is found that the larger the truncation number K 0 is, the closer to the truth the expansion gets (Fig 3). We can clearly see some fluctuations along the true curve at the location where the correlation coefficients are very small when K 0 = 10 ( Fig 3A). As the truncation number increases, these fluctuations become obviously weaker as K 0 = 15 (Fig 3B), and ultimately disappear when using K 0 = 20 (Fig 3C). This means that the first 20 modes form the dominant part of the localization function. In terms of variance contribution, the 20 leading modes account for more than 97% of all modes, no matter how high the resolution becomes (e.g., m = 1001, 10001. see Table 2). In other words, a large number of the remaining modes account for less than 3% of all modes. Table 3 shows that many of the modes have very  x-x 0 x-x 0 x-x 0 New localization scheme small eigenvalues, which make very small contributions to the correlation function when fitting observations. Therefore, with a reasonably small number of modes, the new localization will save computational time without sacrificing much accuracy. Considering a 1D case under periodic boundary condition, with the same experiment setup, but set x 0 ¼ x i 0 and select i 0 to be 90. Fig 4 shows the original correlation function C 0 (r i ) (black curves) and its expansion C K 0 ðx; x 0 Þ (green curves) with different truncation number K 0. Consistent with our finding in the non-periodic case, increase of the truncation number leads to higher accuracy of the value calculated through our expansion.
2D case. In this case, the non-dimensional distance between two points (x, y) and (x 0 , y 0 ) is defined as r ¼  Fig 5A, Fig 5B and Fig  5C) and the original correlation function defined by Eq (9). The conclusion is similar to that in the 1D case, i.e., the larger the truncation number K 0 gets, the closer to the original correlation function the expansion becomes.

Assimilation experiments
The above section demonstrates the consistency between the sin basis function and the GC correlation function. Here, we will check it further with some assimilation experiments. An experiment is that assimilating all observations simultaneously in the EnKF with the new localization approach, another way to do, the same as that in some numerical forecast centers, is to assimilate observation serially, one at a time in the EnKF scheme with the GC correlation function.
The assimilation experiments are preliminarily tested using observation system simulation experiments (OSSEs) in two models that have increasing complexity: a Lorenz-96 model [36] and a spherical barotropic shallow water model. The "true" state (or "truth") is defined by a long-term model run, and the corresponding "observations" are generated by adding uncorrelated random noises to the "truth." Lorenz-96 40-variable model. This model has been widely used to test ensemble-based assimilation methods in a number of earlier studies [14,37]. It is based on the following set of differential equations: where j = 1,2,. . ., M is the spatial coordinate; the forcing parameter and the number of spatial elements are set to F = 8 and M = 40, respectively. The model solves Eq (35) using the fourthorder Runge-Kutta scheme with a time-step of 0.05, where the boundary conditions of Eq (30) are periodic: x j+M = x j [38]. Simulations during a period of time after a long-term integration (e.g., 10 5 model time steps) of the model from an arbitrary initial condition are assumed to be the "truth". Observational data sets include observations of all model variables that are produced by adding uncorrelated random noises with the standard Gaussian distribution (with zero mean and variance of 4.0) to the truth at every step. In this case, the observation number  c) x New localization scheme We conduct three assimilation experiments: one using 500 members without any localization (named "EXP-1"), and the other two using 20 members with the new (named "EXP-2") and traditional (named "EXP-3) localization schemes, respectively. The localization radius sets eight grid spacing, and all experiments use the covariance inflation method of Zhang et al. [10]: much more timesaving than the experiment with large ensemble size (EXP-1), which takes 428 seconds in the same computing environment. The significant timesaving characteristics of the new localization scheme become apparent in the following experiments with a more complex model.
Spherical barotropic shallow water model. To further compare the performances and computational costs of the new and traditional localizations, we use a spherical barotropic shallow water model to conduct two OSSEs. The model was established using a finite difference scheme with exact energy and mass conservations [39], to solve the following set of equations: Here, θ and λ are the latitude and longitude, respectively; u, v and φ denote zonal velocity, meridional velocity and geopotential height, respectively; a is the Earth's radius and f is the Coriolis coefficient.
The model has a horizontal resolution of 4.5˚× 4.5˚(81×41grid points). The initial condition uses the four-wave Rossby-Haurwitz waves. A 20-day integration is conducted first, and the last 10-day integration is taken as the "truth" (i.e., nature run) after a 10-day spin-up. Synthetic observations of geopotential height are created every four gridpoints using the truth plus uncorrelated random noises with the standard Gaussian distribution (with zero mean and variance of 10000.0). In this way, there are 861 φ observations in all.
A common and easy way to implement the traditional localization in the EnKF is through serial processing [14], which assimilates the observations one by one in a cycle. This is considered to be more timesaving than the traditional localization, but with similar performance. Therefore, the serial implementation method is taken as the traditional localization here, and is compared with the new localization using the spherical barotropic model. Two OSSEs are designed for the comparison: one uses the traditional localization with serial implementation (called "ASSM_old"), and the other adopts the new localization with simultaneous implementation (named "ASSM_new"). The filtering radius in all experiments is eight grid spacing. Fig  7 compares the horizontal error distributions of geopotential height among the background (or first guess), and analyses from ASSM_old and ASSM_new. It shows that all analyses greatly reduce the phase errors of about 30˚of longitude existed in the background. In addition, the analysis of ASSM_new at the higher latitude shows marked improvement, compared with ASSM_old. In terms of the RMSE, ASSM_new (1195.982) outperforms the traditional one (1437.468), while all analyses are much better than the background (2752.343). For the computational costs of the two localization schemes, the new localization uses only 25 seconds, far more timesaving than the traditional one, which needs 312 seconds in the same computing environment.

Discussions
In recent years, ensemble-based approaches have been widely used in various topics, e.g., data assimilation and solutions to conditional nonlinear optimal perturbation [40]. Because an ensemble is generally composed of far fewer members than both the number of observational data and the degrees of freedom of model variables, many spurious correlations between different observation locations, between different model grids, or between observation locations and model grids, occurred. Schür product-based covariance localization has become a practical and powerful tool to make ensemble-based methods perform well even under small ensemble sizes [41]. However, a disadvantage of the traditional localization schemes is their large cost.
When observations assimilated are not many, there is little difference in computational cost between two localization schemes. However, with the large number of observations, even the serial implementation of EnKF, the computational cost is still increased dramatically. Then, if the localization uses a few basis functions to expand, it will be useful for improving work efficiency. This study is a preliminary attempt to develop and improve the localization approach within the EDA process. As the first and necessary step, the new scheme was preliminarily evaluated in its application to a simultaneous assimilation using idealized experiments. Further studies are required in three aspects. First, it is necessary to investigate its role in EDAs for real and complex forecast models. Second, an effort should be made to propose a new serial assimilation scheme in a way of using the leading modes one by one due to the orthogonality between these modes, similar to the way of assimilating the observations one by one in the serial processing of EnKF under the hypothesis of independence between these observations. Third, it is worth exploring how to implement the adaptive localization approach [9,17], because it is now well understood that adaptive localization functions may be more appropriate, although the GC localization function has been widely used in EDA methods. It is anticipated that this work will be challenging due to the noticeable difference between the GC localization function, which is completely independent of ensemble samples, and the adaptive localization functions using complex corrections with respect to ensemble samples. This difference may lead to a great difficulty in expanding the adaptive localization functions using the sine functions as the basis functions, because various eigenvector families may be produced by different ensemble members in the adaptive localization.

Conclusions
In this paper, we proposed an economical approach to implement covariance localization. We attempted to use a group of basis functions to expand the correlation function, and found that the spatial distributions of the leading eigenvectors of the correlation function are very close to the sine waves that are defined in the domain of definition. We used the sine functions with different frequencies and phases approximately as the basis functions, so that the localization matrix can be decomposed into a series of products of two vectors, and then the Schür product is separable. In this way, the cost of localization can be greatly reduced.
Two numerical tests with different dimensions were conducted to evaluate the expansions of the correlation function. Both tests demonstrated that the larger the truncation number gets, the closer to the original correlation function the expansion becomes. When the truncation number reaches 20, the difference between the expansion and the truth is very small.
The scheme was then verified in an assimilation cycle with the Lorenz-96 model and a single assimilation experiment with a spherical barotropic shallow water model, using OSSEs of the EnKF. In general, when the ensemble size is much larger than the dimension of the model (e.g., 500 for a simple model like the Lorenz-96), the localization has no influence on the assimilation results and is thus not needed. However, if the ensemble size is smaller than the dimension of the model (say, 20), localization is necessary. The experiments conducted using the simple model suggested that ensemble assimilation using a smaller ensemble size with the new localization scheme could achieve a performance comparable to that with the traditional localization, and that applying a large ensemble size without any localization. The new localization even outperformed the traditional one with serial processing in the OSSEs using the spherical barotropic shallow water model. Moreover, the computational cost depends on the number of ensemble members, i.e., the larger the ensemble size gets, the higher the cost becomes. The new localization was shown to be far more timesaving than the serial implementation of the traditional localization in the single assimilation experiments using the spherical barotropic shallow water model, although the timesaving characteristics of the new localization was insignificant in the case of the simple model because of the very low dimension numbers of the control variables and the observations.