A Hybrid Optimization Method for Solving Bayesian Inverse Problems under Uncertainty

In this paper, we investigate the application of a new method, the Finite Difference and Stochastic Gradient (Hybrid method), for history matching in reservoir models. History matching is one of the processes of solving an inverse problem by calibrating reservoir models to dynamic behaviour of the reservoir in which an objective function is formulated based on a Bayesian approach for optimization. The goal of history matching is to identify the minimum value of an objective function that expresses the misfit between the predicted and measured data of a reservoir. To address the optimization problem, we present a novel application using a combination of the stochastic gradient and finite difference methods for solving inverse problems. The optimization is constrained by a linear equation that contains the reservoir parameters. We reformulate the reservoir model’s parameters and dynamic data by operating the objective function, the approximate gradient of which can guarantee convergence. At each iteration step, we obtain the relatively ‘important’ elements of the gradient, which are subsequently substituted by the values from the Finite Difference method through comparing the magnitude of the components of the stochastic gradient, which forms a new gradient, and we subsequently iterate with the new gradient. Through the application of the Hybrid method, we efficiently and accurately optimize the objective function. We present a number numerical simulations in this paper that show that the method is accurate and computationally efficient.


Introduction
Subsurface geology is always uncertain. Uncertainty assessment of geological description and reservoir production prediction is an inverse problem and is usually performed by generating a suite of plausible realizations of the reservoir model that are consistent with the available data (Fig 1). Randomized maximum likelihood (RML) was introduced by Oliver [1] and is used to build an a posteriori probability density function (PDF). Generating a realization with RML involves minimizing an objective function with an optimization method [2,3]. In the early studies of history matching, the least square method was used for optimization. Because the objective function usually contains a large number of parameters, the descending dimension method was introduced into the optimization process to reduce the computation time.
There are two different types of methods for optimization in history matching: the gradientbased method and the randomized method. The gradient obtained from the gradient-based method can be calculated by either using the Jacobi or adjoint method. Due to the complexity of solving the Hessian matrix, which is required in the gradient-based method, it could take many iteration steps to obtain a small value of the objective function. The Quasi-Newton method, which does not require calculating the Hessian matrix, was used by P. Yang [4] to minimize the objective function; however, due to having a long computation time, it can only be used in oneor two-dimensional flows. The Gauss-Newton method was later applied to simulate three-phase flows and extend the scope of the history matching, even though the computation of the gradients is often more expensive than solving the flow equations. To overcome this problem, some have used fast streamline-based simulation methods for history matching [5][6][7].
The randomized methods have overcome some of the drawbacks of gradient-based methods because there is no need to calculate the accurate gradient; instead, one only obtains the stochastic gradient by the value of the objective function. Simulated Annealing and Evolutionary Algorithms such as genetic algorithms [8] and Evolution Strategies have been adapted in various reservoir performance optimization frameworks, Ant colony foraging optimization algorithm [9,10]which is drived from ant colony foraging process is one of global search methods. The ant will leave a kind of hormone in the process of foraging and the subsequent ants can identify the amount of these hormones to judge whether this path is the best route. Through this algorithm we can get the optimal solution. What they have in common is that a large amount of calculation and a slow rate of convergence. For large practical optimization instances, they have a narrow applicability because the computation time is long. For this reason, the Ensemble Kalman filter method was introduced into the reservoir history matching field. The Ensemble Kalman filter (EnKF) is one of the Monte Carlo approaches for history matching [11,12]. The EnKF method has resolved many intractable reservoir problems in recent years.
The simultaneous perturbation stochastic approximation (SPSA) algorithm, which was developed on the basis of the Kiefer-Wolfowitz stochastic approximation algorithm [13], has also attracted much attention for challenging optimization problems in which it is not easy to directly obtain a accurate gradient of the objective function for the variavles being optimized [13,14]. SPSA is an easily implemented and highly efficient stochastic gradient that relies on an objective function. In contrast, the finite difference approach requires a lot of function measurements for which the amount is proportional to the dimension of the gradient vector. Applications of the SPSA include training neural networks, monitoring signals, et al. H. Klie [15] and Gao [16] introduced SPSA into the reservoir history matching field. Without considering the correlation of the model parameters, SPSA reaches an unsatisfactory goal. For this reason, Li and Reynolds [17] replaced the Bernoulli distribution with a Gaussian distribution of  disturbance variables and introduced a covariance matrix into SPSA; this approach has a better matching result but has a slow rate of convergence. At present, those methods which combine many different kinds of optimization algorithms have been widely applied in the reservoir production optimization field [18,19]. We adopt an approach by combining partial finite differences with the stochastic method based on a directional derivative to solve the inverse problem and quantify the difference frequency in order to increase the stability of this method. This method (called the Hybrid method) mainly replaces the corresponding important components of the stochastic gradient in sequence and integrates them to create a new approximation gradient for optimization. Examples of using the Bayesian approach to solve engineering problems can be found in [19][20][21][22][23][24][25][26] When quantifying the uncertainty in the model parameter identification problems, the novelty of the present work rests in the demonstration of the potential for accelerating the solving time of inverse solution. By employing the Hybrid method and the improved algorithm in the inverse solution process, the solution time can be considerably reduced.

Objective function
The Randomized Maximum Likelihood Estimation (RMLE) is one of the methods that can be performed to find the value of x by Probability theory, where the overall probability function p (X,θ) is known, and θ is an unknown parameter of the set Ф.
We filter N e random samples, which are defined as X 1 ; X 2 ; Á Á Á ; X N e , from the population X, and obtain the corresponding actual observed values x 1 ; x 2 ; Á Á Á ; x N e . Then, the probability that Due to Probability theory, the probability function p(θ) can obtain the maximum value. In engineering problems, parameters (such as porosity, permeability) usually have a type of probability distribution. From experience, we know that these parameters usually obey a Gaussian distribution in an actual application.
The Probability Density Function is Where x~N(μ,σ 2 ), μ is the mean of x, and σ 2 is the variance. Eq 2 can also be expressed as Where x denotes the N e dimensional vector, which includes uncertain parameters; and x av denotes the prior model parameters; C X is the N e × N m covariance matrix of the measurement errors that are generated by the statistical methods, and C X 2 R N e ÂN m .
Based on statistical theory, the relationship between the observed production data and the reservoir model parameters in the oilfield development process is Where d obs denotes the N d dimensional vector that includes the actual observed data; g represents the numerical simulator of the reservoir system; ε r is the measurement error; and C D is the covariance matrix.
Maximum likelihood estimation can be performed to find the parameters that maximize the probability that the calculated value is the same as the observed value; thus, the conditional probability distribution function of d obs under the condition of x in the reservoir model is Gradient-based optimization and regularization techniques are mainly used toward this goal, generally by maximizing the logarithm of the likelihood and thus, resulting in a single estimated value for a local maximum. A Bayesian approach is adopted herein, which leads not only to a point estimate of the parameters of interest but also to a probability distribution. By attributing a prior distribution to the model parameters and applying the Bayesian theorem, we obtain their posterior distribution, which enables one to quantify the uncertainty concerning the parameter values.
The Bayesian theorem provides the solution of the inverse problem pðxjd obs Þ / pðd obs jxÞpðxÞ / exp À 1 2 ðd obs À gðxÞÞ T C À1 D ðd obs À gðxÞÞ À where p(x) is the prior distribution. To obtain the maximum value of the conditional probability of x under d obs , the objective function is When the conditional probability is the maximum value, the corresponding objective function becomes the minimum value. Therefore, reservoir history matching problems can be translated into resolving minimum value problems of the objective function P(x). When the objective function is the minimum, the corresponding variables x are close to the actual parameters.

Singular value decomposition
For actual reservoir history matching problems, the dimension N m of the reservoir parameters that require inversion is typically tens of thousands. Therefore, it is extremely difficult to optimize the objective function, and the computational cost can become tremendously expensive due to the large size of the linear systems, especially the large amount of computation that is required to calculate the inverse matrix C À1 X ðx À x av Þ. Therefore, this paper utilizes the singular value decomposition method. Through this method, C À1 X and C À1 X ðx À x av Þ can be transformed into low-dimensional matrixes, which avoids complex computation.
According to the definition of covariance, the covariance matrix of the initial model can be approximately calculated as where dX ¼ ½dx 1 ; dx 2 ; . . . ; dx i ; . . . dx N e N m ÂN e , and the jth column vector is (x − x av ). With Singular Value Decomposition, Where U and V are the singular vectors of δX, and Λ is the singular value of δX. U is composed of orthogonal unit characteristic vectors of C X . Λ T Λ is composed of characteristic values. Because Assuming that there are N S non-zero singular values in Λ, then The approximate inverse matrix of C X is By substituting C X of the objective function with C À1 X , the N m dimensional optimization problem about x is reduced to N s dimensions, which avoids the process of solving the inverse of C X (its dimension can reach up to tens of millions). This method has greatly simplified the difficulty of calculation and improved the computational efficiency. The final function becomes Solution

Linear sequential solutions
This paper uses the linear search method for optimization. Here, the linear search consists of two key elements: one is the search step size, which ensures the convergence of the search, and the other is the search direction, which determines the rate of convergence. In actual application, the normalization method of the search direction is usually used for iterative calculation. Now, we introduce the optimization method of the linear search.
In the process of optimizing the objective function P(x), set the initial variable vector x 0 to zero and calculate the initial objective function value P(x 0 ) first. At the kth(k = 1,2,3,ÁÁÁ,K max ) iteration step, do the following: First, set the initial search step λ k = 1, and calculate the stochastic gradient g k (x) as the search direction D k through the optimization algorithm; then, normalize D k ; Second, update the variables vector x k by Third, calculate the objective function value P(x k ) and compare P(x k ) with P(x k-1 ). Then, determine whether P(x k ) satisfies 0 jPðx k ÞÀPðx kÀ1 Þj maxfPðx k Þ;1g ε p . If the result is YES, then exit the iteration; if the result is NO, then keep the values of x k and P(x k ), and then, set k = k + 1 and continue the next iteration step. If P(x k ) > P(x k-1 ), then reduce λ k by half and continue to calculate the objective function value P(x k ) until it reaches the maximum number of times m that λ k can be halved; if beyond, exit and continue another iteration.
Algorithm I. Linear optimization algorithm. 1. Generate the initial value x 0 . 2. For k = 1,2,3,ÁÁÁ until convergence (a) Calculate the stochastic gradientĝ k (x) as the search direction D k through the optimization algorithm and set the initial search step to λ k = 1.
(b) For i = 1,2,ÁÁÁ,m, calculate If the result is Yes, then exit the iteration. If the result is No, then In the above algorithm, ε p is approximately 10 −4 , and k•k 1 is the infinite norm value. Because the stochastic gradientĝ k (x) in the optimization process is always calculated by the stochastic algorithm, we introduce the stochastic algorithm first.

Stochastic algorithm based on the directional derivative
The basic principle of this algorithm is to disturb the argument x ¼ ½x 1 ; x 2 ; Á Á Á ; x N e T and obtain a new variable vector . .
x N e þ aDx N e Where α is the disturbance step, and Dx ¼ ½Dx 1 ; Dx 2 ; Á Á Á ; Dx N e T is the disturbance variables vector, with the element Δx i (i = 1,2,ÁÁÁ,N e ) in accordance with the Bernoulli distribution of 1 or -1; therefore, Dx À1 i is equal to Δx i . By calculating the corresponding objective function values and taking the difference of each increment, we can obtain the stochastic gradientĝ(x).
In each iteration step,ĝ(x) of P(x) at x can be expressed aŝ The stochastic gradient iŝ The directional derivative reflects the rate of change of the objective function value in a specific direction. The formulation is where @P @l is the directional derivative of the objective function in the l ! direction; and cosθ i is the cosine value of he i ;li.
The directional derivative of x by using the true gradient can be expressed as From Eqs 18 and 19, we can obtain @P @l Set g 0 ¼ coshφi Á ðcoshφiÞ T Áĝ . From Eq 20, we can see that g' is the increasing direction, which ensures the convergence of this algorithm.
We usually obtain the average of the stochastic gradient through several disturbances in pursuit of improving the accuracy of the gradient, and then, we resolve the problem by the above optimization method (Algorithm I) after obtaining a stochastic gradient. g ðxÞ ¼ Algorithm II. Stochastic algorithm based on directional derivative. 1. For k = 1,2,ÁÁÁN, generate random Δx k (a) Calculate x k = x + αΔx k and p(x k ), (b) Then, computê End for Because the gradient obtained from Algorithm II is approximate, the stochastic gradient with a perturbation and directional derivative has a high uncertainty. Although Algorithm II has improved some of the shortcomings that other algorithms have, it also has many deficiencies, such as a large number of iterations and slow convergence of the objective function. To avoid these shortcomings, we propose a new algorithm by modifying Algorithm II.

Principle of the hybrid algorithm
The main idea of the Hybrid algorithm is to first calculate the stochastic gradient of the objective function by using Algorithm II and, then, to replace the components with the largest (in magnitude) stochastic gradient with approximate values of the gradient from the finite difference in a proper sequence until the direction of the modified gradient is nearly the same as the unknown real gradient direction.
According to Taylor's formula, P(x+αΔx) − P(x) can be expanded as Then, where e k is a unit vector in the kth direction. Substituting Eq 23 into Eq 22, we obtain Here, ΔP k (x) is the increment of the objective function value in the kth direction. The conclusion is that the increment of the objective function value with the simultaneous perturbation of all of the components is approximately equal to the sum of each function value increment with the separate disturbance of each component of the vector x.

Hybrid algorithm
A simple equation is introduced to vertify the changement og some components has a great impact on the result of function.
From Table 1, we can realize that Δy(x) takes the maximum value when the 4th component is changed, which has the greatest impact on the result. Here, x ð4Þ 0 is the largest component of all of the components, and thus, we regard x ð4Þ 0 as the 'important' element. We could find out the 'relatively important' elements of the stochastic gradient by relying on this standard.
The index collection of all of the components in the stochastic gradient can be denoted as U = {1,2,ÁÁÁ,N e }; I u and I d are its two subsets, which satisfy I u [ I d = U and I u \ I d = ϕ.I u denotes the relatively 'important' components of the stochastic gradient, and I d denotes the 'unimportant' components.
The hybrid algorithm is used to calculate the elements of I u with the finite difference method and replace the elements of I d by the stochastic gradient.
Determining the search direction is a process of continuous iteration; thus, we set the initial values I 0 u ¼ and Now, we consider the mth iteration step. First, generate a stochastic gradient by using Algorithm II: Pðx k þ ae ðjÞ Þ À Pðx k Þ a ! e ðjÞ cosðφÞ The lth component of the gradientĝ(x k ) iŝ Replace ΔP (j) with ΔP (j),m ; then, Then, obtain the realistic gradient rP mþ1 u ðx k Þ, where the ith element (i 2 I mþ1 u ) can be obtained by finite differences, and record the other elements as zero.
Calculate DP ðjÞ;mþ1 u and DP ðjÞ;mþ1 d DP ðjÞ;mþ1 Where j = 1,2,ÁÁÁN max . Fourth, calculate the angle θ between the approximate gradient and the unknown true gradient, as follows: If cos(θ) ! ε, then set D k ¼ rP m u and exit the Loop. If cos(θ) < ε, then determine whether the 'important' elements in I m u obtained by the stochastic gradient correspond to the important elements of the true gradient. Check each ith element in the collection I m u , as follows: If there exists one component of the stochastic gradient that is 'important', but the value recalculated by finite differences does not satisfy Eq 36, then mark this identification of the 'important' component as failure and add the number of failure times to 1. The maximum number of failure times allowed in the mth iteration step is N f,max .
If the number of failure times N i exceeds N f,max , then we should reselect new samples, recalculate the significant elements and determine the 'important' elements. If N i < N f,max , then reselect N key 'important' elements from I m d by finite differences, and set m = m+1; continue this iteration until cos(θ) ! ε.

The standard of judgment
As mentioned earlier, P u is the approximation gradient that is in almost the same direction as the true gradient. The cosine of the angle between these two gradients is given by where rP u is the approximation gradient that is obtained by using finite differences, and rP is the unknown true gradient. Set rP u,l (x) to be the lth component of rP(x). When the element l satisfies l 2 l u in rP(x), then calculate rP(x) by finite differences; if l 2 l d , then set it to zero. Then, Replacing (rP u ) T rP by krP u k 2 in Eq 37, with Because Z obeys the Gaussian distribution, therefore Then, and We can obtain The angle between the approximation gradient and the true gradient can be calculated by Eq 46.
According to Pradlwarter (2007) [27], to determine whether the lth component of the gradient in I u is 'important', we can calculate the square value of this component and compare it with the average of all of the components of rP(x k ), as follows: If the result is 'YES', then mark it as 'important'. Conversely, if the result is 'NO', then consider it to be unimportant. Because of the unknown true gradient rP(x k ), according to Eq 43, Eq 47 can be expressed as With the above equation, we can determine whether the lth element is 'important' or not.

The verification and improvement
Function I. The first mathematical function is to illustrate the reliability of the Hybrid algorithm. The objective is to minimize the following function: Where x = [x 1 ,x 2 ,ÁÁÁx N ] T , and the initial guess is x i = 2,i = 1,ÁÁÁ,N. We set the number of variables to be N = 100. The minimum of Eq 49 is 0, which occurs when x = [0,0,ÁÁÁ0] T .
In Algorithm II, set the initial search step to λ 0 = 1, regard the average of the stochastic gradient calculated with 5 stochastic perturbations as the search direction, and set the disturbance variable α to 0.015. Record the objective function values of each iteration step and the cosine values.
In the Hybrid algorithm: (1) Obtain the stochastic gradient by using Algorithm II, and then, select 5 'important' components of the stochastic gradient, which constitutes a new approximate gradient to iterate; (2) The angle between the approximate gradient and the true gradient should satisfy cos(θ) ! 0.8. If not, then recalculate and reselect the 'important' elements; (3) The maximum number of failure allowed is 2; (4) The maximum number of simulations is 25; if this number is exceeded, then recalculate the stochastic gradient. Calculate the objective function value for the optimization in both Algorithm II and the Hybrid algorithm, and compare the rate of convergence and cos(θ) for the two methods. Table 2 shows the behavior and results obtained from Hybrid algorithm for the first iteration m = 1. The first column refers to the iteration index s during the calculation of the approximate gradient rP u , and two iterations are used to obtain rP u until satisfies cos(θ) ! 0.8. Column 2 to Column 5 show that five components of the gradient from Algorithm II are recomputed by using finite difference at each iteration, meanwhile, the component index and the specific gradient values are shown in column 2, column3 and column 4, respectively. Because the gradient is particular stochastic, the chosen components of the gradient by Algorithm II may not necessarily refer to the actual "important components". As shown at iteration s = 1, the 71th component value of the gradient by Algorithm II is as large as -1613.82 while the recomputed approximate gradient is a much smaller 81, notice that in Eq 47, the 71th component turns out to be failure. When the failure time reaches the maximum allowable number 2, it suggests that the approximate gradient is not accurate enough, then continue another iteration until cos(θ) ! 0.8. During the two iterations, components 71,128,55,183,85,10,25,116,70 and 45 are recomputed by finite difference and the rest components of are set equal to zero. Fig 2 shows that the cosine value of θ obtained by Algorithm II is approximately 0.3, which illustrates that the angle between the approximate gradient and the true gradient is approximately 73 degrees; however, the cosine value of θ obtained by the Hybrid algorithm is larger than 0.8 (in other words, the angle between the approximate gradient and the true gradient does not more than 35 degrees). The result has demonstrated that the direction of the gradient from the Hybrid algorithm is closer to the true gradient direction than it from Algorithm II, which illustrates that the Hybrid algorithm has a higher accuracy than Algorithm IIin the search direction at each iteration step. Fig 3(A) shows that the objective function value will decrease in both Algorithm II and the Hybrid algorithm. This finding means that the objective function also converges with the Hybrid algorithm. From the above Fig, we can find that the Hybrid algorithm behaves better than Algorithm II and has a higher rate of convergence. At approximately the 400th computation time in the Hybrid algorithm, the objective function value is close to the final value, while Algorithm II requires probably over 800 computation times to reach the same value. Thus, the Hybrid algorithm reaches the same result of convergence as Algorithm II with less simulation runs. Fig 3(B) shows that to achieve the same value of the objective function, Algorithm III requires 20 iteration steps, while Algorithm II requires more than 100 iteration steps.  Function II. The Rosenbrock function is a nonconvex function that is often used to test the performance of an optimization algorithm. Hence, the objective is to minimize the Rosenbrock function for the N-variable case, which is given by Where x = [x 1 ,x 2 ,ÁÁÁ x N ] T , and the initial guess is x i = 2,i = 1,ÁÁÁ,N We consider four cases, in which the number of variables is N = 100,200,400,800, and we compare the optimization results by using the Hybrid algorithm. In each step, we require the accuracy in the direction of the estimated gradient to satisfy cos(θ) ! 0.8. Fig 4(A) shows that the objective function can obtain a minimum value with the Hybrid algorithm in different cases, and the function evaluations increase with the increase in the number of variables. We also could attain the same property from Fig 4(B). The result has verified that this algorithm is acceptable with a different number of variables.
Next, we consider the four cases with respect to both Eqs 49 and 50, to optimize and compare the cosine values and finite difference frequency (Table 3). From Table 3, we can realize that the estimated gradient by the Hybrid algorithm satisfies cos(θ) ! 0.8 with different numbers of variables; the finite difference (F-D) times are not the same in different steps, and the average number of finite differences increases with the increase in the number of variables.
From Table 3, we can realize that the finite difference frequency can be excessive, which would reduce the convergence in a certain iteration step, while in another iteration step, the finite difference frequency could be very small. Because of this uncertainty, we introduce the average number of finite differences in each step instead. Now, we discuss the relationship between the differential rate and the number of variables. Fig 5 illustrates the differential rate (the ratio of the difference frequency and the number of variables in each step) in different cases, which we have discussed. Through this Fig, we can obtain that in most cases, the differential rate is less than 0.15, which means that we can obtain a much more accurate gradient by finite differences for 15 times per 100 variables. Thus, we take 15% of the number of all the variables for finite differences to replace the judgment of the cosine value.
Because we use quantitative finite differences to replace the judgment of the cosine value in every step, the Hybrid algorithm can become the following: According to the above analysis, we can realize the following: (1) the approximate gradient obtained by introducing finite differences into Algorithm II has better convergence than that generated only by Algorithm II; (2) With fewer iterations and less computation time, the Hybrid algorithm has a better optimization result than Algorithm II; and (3) the quantitative finite difference frequency can improve the stability of the Hybrid algorithm. Those characteristics provide the possibility for the application of solving the inverse problem of reservoir history matching.

Numerical Examples
Case1: Theoretical reservoir model This reservoir simulation model is a simple 2D model with 25×25 two-dimensional log-permeability distributions with great heterogeneity. It includes a total of 9 injection wells and 4 production wells. In this model, the period of history matching is 1200 days, and every control step is 120 days. In other words, the well control parameter will be adjusted per every 120 days.
In the process of history matching, the permeability at each of the 625 cells in the grid must be calculated and updated. The production data required for matching includes the flowing bottom-hole pressure and the oil and water production of the wells. We can regard the simulation result as the observed data of the reservoir by adding the errors of the normal distribution to the result of the initial model. We generate 100 initial reservoir models based on prior geological information by using the Sequential Gaussian Simulation Method (Fig 6). From Fig 6, we can see that there exist large differences between the distributions of the initial permeability among those models; the hypertonic and hypotonic zones are quite different. However, every model has reflected the basic reservoir characteristics: the direction of the hypertonic stripes and their approximate positions. By generating 100 stochastic models have reduced the construction error of the models because of their differences. Fig 7 shows the real permeability field which generated from 100 random reservoir models. Because the parameters in those models are in accordance with Gaussian distributions, the real permeability (Fig 7) is also in accordance with Gaussian distributions. We use Algorithm II and III for the optimization of history matching objective function, and then compare the optimization results.
To obtain the minimum value of Eq 14, we use both Algorithm II and Algorithm III. In Algorithm II: We set the initial step to 1 in the linear search process and obtain the stochastic gradients, which requires 5 random perturbations at each step; then, we iterate with their average gradient.
In Algorithm III: The initial setting and the process of generating the stochastic gradient are consistent with Algorithm II. The difference between the two algorithms is that the gradients for the iterations are not the same. In Algorithm II, the calculated stochastic gradient is used directly for optimization, while in Algorithm III, a new gradient must be generated from the previous stochastic gradient with quantitative finite differences.
From Fig 8, we can see that several hypertonic stripes appear in the matching permeability field, and the trends of the stripes are similar to those in the reference permeability field. In Fig 8(A), the distributions of the hypertonic stripes are clearly in the matching permeability field but are very different from those in the reference permeability field because of the randomness of the gradient obtained by using Algorithm II. In Fig 8(B), the distribution of the hypertonic stripes not only is clear in the matching permeability field but also is much closer to those in the reference permeability field. With this result, we can realize that by using Algorithm III, we can describe the reference permeability field more clearly than by using Algorithm II. Now, we analyze the optimization result of the production data by using both algorithms. Fig 9 shows the rate of water production of well Pro-4. The observed data varies greatly with the increase of time. We can obtain the simulation result that the stratigraphic parameters can be inversed by Eq 14. From Fig 10, we can realize that the matching curve from using Algorithm III is much closer to the observed data than that obtained by using Algorithm II.

Case 2: Three-phase reservoir model
This reservoir simulation model is a 20×30 two-dimensional model, and it has a total of 6 production wells. In this model, the period of history matching is 9000 days, and every control step is 30 days. In other words, the well control parameters will be adjusted per every 30 days. In the process of history matching, the variables that must be resolved include the permeability at each cell, and the number of cells is 600. The production data that need matching include the bottom-hole flowing pressure, oil, gas and water production of wells.
In the production process of this reservoir model, the 6 wells began constant production, one-at-a-time in a sequence per 90 days. The first one is Pro-1, and the last one is Pro-6. After 1630 days, Pro-1 was converted into a water injection well while other production wells retained the same quota.
Before the start of the numerical simulation, we set the initial permeability field, which is based on reservoir information, as in Fig 10(A). Fig 10(B) shows the real permeability field. Then, we used two algorithms for optimization. Fig 11 shows the estimate of the log permeability that was obtained by history matching of the observed bottom pressure and the WOR (water oil ratio) data with the two algorithms. The  matching result obtained by Algorithm II, as shown in Fig 11(A), shows the basic permeability channel structure without exactly matching the truth.
In Fig 11(B), the matching result shows that the matching result by Algorithm III has correctly reflected the permeability channel features, especially near the production wells where the reservoir characteristics can be exactly described.
From Fig 12, we can see that there emerges a very large gap between the initial and observed data. After the history matching by using the two algorithms respectively, the results agree well with the observed data.
The matching results of the cumulative gas production are shown in Fig 12. In the early stages of oil field development, the gas production data of both the initial and real model are consistent. Those data begin to change at the 6000th day. After producing for 9000 days, the cumulative production of gas in the real model is 1.0203×10 7 m 3 . The matching result by Algorithm II is 9.3057×10 6 m 3 , while the matching result by Algorithm III is 1.0191×10 7 m 3 . Algorithm III has a better matching effect than Algorithm II. Both Algorithms II and III have the same matching result for the other production data. Fig  13 shows the matching result of the other production data by Algorithm III. We can realize that through Algorithm III, we can obtain a satisfied optimization result, and Algorithm III can resolve the inverse problem of optimization.
One of the criteria for comparing two algorithms is the rate of convergence of the objective function. The initial objective function value is 4.16×10 5 (Fig 14). After a sufficient number of simulations, both of the algorithms could minimize the objective function value to near zero.
From Figs 12 and 13, we can require that both algorithms can match the production data very well. In Fig 14, we can see that the objective function optimized by Algorithm III has a higher speed of decline than that obtained from Algorithm II initially, but after 100 iterations, the objective function values of the two algorithms are nearly the same.
Through the above comparison of several aspects, we can see that Algorithm III has a higher rate of convergence and reflects the geological information more clearly than Algorithm II, but the variance matrix constituted by the formation parameters must be reduced dimensionally by decomposition before the optimization (in Problem Formulation part), which greatly reduces the number of parameters (it reduces to 100 in this paper). This method increases the  errors between the optimization parameters and the observed data. Thus, when there are too many formation parameters, tthe advantages of Algorithm III over Algorithm II are not very obvious. Therefore, there exists potential for further research in this field.

Conclusions
The present paper has investigated the application of finite differences with a stochastic algorithm for history matching in reservoir models. We use this method to optimize and resolve the Bayesian inverse problem, which depends on posterior distributions. Stochastic algorithms (such as SPSA, directional derivatives) determine the direction of the gradient with a random perturbation. The disadvantages are that the direction of the approximate gradient deviates from the direction of the true gradient; there are a large number of iterations; it has a slow rate of convergence, and it is vulnerable to local loops. The Hybrid algorithm obtains an approximate gradient that is much closer to true gradient by partial finite difference. It increases the accuracy of the search direction and improves the rate of convergence. This paper has verified (by the mathematical model and reservoir examples) that the Hybrid algorithm has the following advantages: • Hybrid algorithm introduced with finite difference on the basis of a stochastic algorithm has greatly improved the accuracy of the approximate gradient, and this gradient is closer to the true gradient as the iteration steps increase; • The approximate cosine formula to determine the accuracy of the approximate gradient has a high degree of accuracy, which provides a criterion to judge the accuracy of the approximate gradient for actual reservoirs; • Compared with the stochastic algorithm based on a directional derivative, the Hybrid algorithm has a faster rate of convergence and can better describe geologicalinformation.