Bayesian optimization for computationally extensive probability distributions

Ryo Tamura; Koji Hukushima

doi:10.1371/journal.pone.0193785

Abstract

An efficient method for finding a better maximizer of computationally extensive probability distributions is proposed on the basis of a Bayesian optimization technique. A key idea of the proposed method is to use extreme values of acquisition functions by Gaussian processes for the next training phase, which should be located near a local maximum or a global maximum of the probability distribution. Our Bayesian optimization technique is applied to the posterior distribution in the effective physical model estimation, which is a computationally extensive probability distribution. Even when the number of sampling points on the posterior distributions is fixed to be small, the Bayesian optimization provides a better maximizer of the posterior distributions in comparison to those by the random search method, the steepest descent method, or the Monte Carlo method. Furthermore, the Bayesian optimization improves the results efficiently by combining the steepest descent method and thus it is a powerful tool to search for a better maximizer of computationally extensive probability distributions.

Citation: Tamura R, Hukushima K (2018) Bayesian optimization for computationally extensive probability distributions. PLoS ONE 13(3): e0193785. https://doi.org/10.1371/journal.pone.0193785

Editor: Nuno Araujo, Universidade de Lisboa, PORTUGAL

Received: October 31, 2017; Accepted: February 16, 2018; Published: March 5, 2018

Copyright: © 2018 Tamura, Hukushima. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the paper.

Funding: R. T. was partially supported by the Nippon Sheet Glass Foundation for Materials Science and Engineering (http://nsg-zaidan.or.jp/index_en.html). K. H. was partially supported by a Grants-in-Aid for Scientific Research from JSPS, Japan (https://www.jsps.go.jp/english/index.html) (Grant No. 25120010 and 25610102).

Competing interests: The authors have declared that no competing interests exist.

Introduction

Bayesian optimization [1–5] has recently attracted much attention as a method to search the maximizer/minimizer of a black-box function in informatics and materials science [6–12]. In this method, the black-box function is interpolated by Gaussian processes. Then the interpolated function is used to predict the maximizer/minimizer of the black-box function. The Bayesian optimization is effective for problems where the value on the black-box function cannot be easily obtained. In other words, it is effective when the data for the black-box function is limited.

We are currently developing a generic effective physical model estimation method from experimentally measured data using machine learning, which relates to calibration in data science [13–15]. As the first example, we developed a method to estimate a set of model parameters x = (x₁, …, x_K) in the Hamiltonian , where K is the number of model parameters [16]. Let y^ex be the set of physical quantities {y^ex(g_l)}_l=1,…,L depending on the external parameter g_l with L being the number of data. By using Bayes’ theorem, the posterior distribution P(x|y^ex), or the conditional probability of x given y^ex is expressed as (1) where P(x) and Z(y^ex) are the prior distributions of the model parameters and the normalization constant of the posterior distribution, respectively. Assuming that the observed noise follows a Gaussian distribution with a mean of zero and a standard deviation of σ, the likelihood function P(y^ex|x) is given as (2) where {y^cal(g_l, x)}_l=1,⋯,L is the g_l dependence of the physical quantity calculated from , and hereinafter let y^cal(x) be the set of {y^cal(g_l, x)}_l=1,⋯,L. Then, the posterior distribution is expressed as (3) where the “energy function” as a function of x is given by (4) From the viewpoint of the maximum a posterior (MAP) estimation, the plausible model parameters for explaining y^ex are obtained as the maximizer of Eq (3) or the minimizer of Eq (4). Thus, the most fundamental task for construction of an effective model is summarized to maximize Eq (3) or minimize Eq (4).

A computational method to evaluate the posterior distribution or energy function consists of a double-loop calculation. In the inner loop, the physical quantities y^cal(x) are calculated from when a set of model parameters is given. The computational cost of the inner loop depends on the simulation method used to calculate y^cal(x). As discussed in Ref. [16], the steepest descent method is a promising way for this calculation when the input data is assumed to be explained by a simple classical Hamiltonian at the zero temperature. Evaluating y^cal(x) for the given , in general, requires statistical/quantum mechanical many-body calculations, such as the Markov-chain Monte Carlo (MCMC) method [17–21], the exact diagonalization method [22–24], and the density matrix renormalization group method [25–27]. All of which drastically increase the computational cost for the inner loop.

In the outer loop, a sampling of a model parameter x in the posterior distribution is performed. In Ref. [16], we used the MCMC method with the exchange Monte Carlo method [28]. Although this combined method efficiently yields the global maximum of the probability distribution even when many local maxima exist, an enormous number of sampling points is very time consuming. Consequently, the MCMC approach to calculate the outer loop of a complicated effective model estimation is one of the main obstacles for applications in material science.

In this paper, a computational method that estimates the effective model with a reduced outer loop computational cost is discussed on the basis of a Bayesian optimization for computationally extensive probability distributions. In the our Bayesian optimization technique, extreme values of acquisition functions obtained by Gaussian processes are used as candidates of maximizers of Eq (3) or minimizers of Eq (4). We investigate the efficiency of our Bayesian optimization technique to search the minimizer of E(x) defined by Eq (4) relative to the random search method, the steepest descent method, and the Monte Carlo method when the number of sampling points is fixed to be small. In our demonstrations, the magnetization curve from the classical Ising model calculated by the mean-field approximation and the specific heat from the quantum Heisenberg model calculated by the exact diagonalization method are treated as the inputted measured data. Consequently, it is found that the Bayesian optimization is useful to search a better maximizer of the computationally extensive probability distribution.

Bayesian optimization

Gaussian process

The Gaussian processes are a powerful machine learning technique to estimate unknown data from known data sets [29]. Here we consider the case when the given data set is {x_n, E(x_n)}_n=1,…,N, where N is the number of data. In our case, x_n is the set of model parameters in the effective physical model and E(x_n) denotes the value of the energy function E(x) defined by Eq (4) on x_n. Using Gaussian processes which are zero mean, the conditional probability P(E(x)|x) of E(x) given any x is written as the Gaussian distribution with a mean of μ(x) and a standard deviation of δ(x): (5) (6) where I_N is the N-dimensional identity matrix. Furthermore, E, k, K, and c are defined as (7) (8) (9) (10) where k(x_i, x_j) is the Gauss kernel function: (11) Although the computational cost of Gaussian processes is , some methods to reduce it including an approximation method and their efficiencies are currently under investigation [30, 31]. In this formula, λ and γ are the hyperparameters, which should be specified prior to the analysis. While various methods have been proposed to determine the hyperparameters, we adopt the cross validation method for determination of the hyperparameters λ and γ, which are chosen so as to minimize the prediction error. In the cross validation, the data set D, that is, {x_n, E(x_n)}_n=1,…,N is randomly divided into S data subsets. Each data subset is expressed by D_s labeled by s = 1, …, S. One of the S data subsets is regarded as the testing data, while the remaining S − 1 subsets are used as training data. For each data subset G_s = D \ D_s, Gaussian process training is performed when the training data are {x_n, E(x_n)}_{n∈G_s}. The mean-square error between the testing data E(x_n) and the estimated μ(x_n) for n ∈ D_s is evaluated. The cross validation regards the mean-square error as the prediction error when the testing data D_s is treated as unknown data. The optimal values of λ and γ are evaluated to minimize the prediction error averaged over S data subsets.

Bayesian optimization for computationally extensive probability distributions

We introduce a Bayesian optimization technique to find a better minimizer for the energy function E(x) defined by Eq (4), when the number of sampling points is limited. Our Bayesian optimization is comprised of the following procedure:

Step 1: Sets of model parameters x_n are randomly generated with n = 1, …, P, and E(x_n) is calculated for the generated x_n. That is, the P calculations of y^cal(x_n) from are necessary.
Step 2: Gaussian process is trained for the data set {x_n, E(x_n)}_n=1,…,P, yielding the mean value μ(x) and the standard deviation δ(x) of P(E(x)|x).
Step 3: The steepest descent method with randomly chosen initial parameters is performed for the three types of acquisition functions [32–35] defined as (12) (13) (14) where κ > 0 and 0 < ϵ ≤ 1 are the hyperparameters. |X| is the size of the search space, and t is the step of repetition of BO. Furthermore, ϕ(Z) and Φ(Z) are the standard normal probability distribution function and its cumulative distribution function, respectively, and E_min is the present minimum value of E(x). Then, a local or global minimum x* of acquisition functions is obtained and Q different model parameters are generated by repeating this operation. Note that the fixed value of ϵ as 0.5 is used in the analysis of this paper for simplicity.
Step 4: E(x*) is calculated for each x* obtained in Step 3. By adding the new data, the data set is updated as {x_n, E(x_n)}_n=1,…,P+Q. Here, the Q calculations of y^cal(x_n) from are necessary.
Step 5: Steps 2–4 are repeated R times. In each iteration, the number of data points is increased by Q evaluation.
Step 6: Finally, the minimum value of E(x) from {x_n, E(x_n)}_{n=1,…,P+Q×R} is determined.

We emphasize that the number of calculations of y^cal(x_n) from is N_s = P + Q × R in this procedure, which corresponds to the number of sampling points on E(x). The computational cost in Step 3 is low because μ(x) and δ(x) are quickly obtained for a given x. Thus, many candidates for a local minimum or a global minimum of E(x) are generated from the acquisition functions without calculation of E(x), which is the key of our Bayesian optimization. Notice that an alternative approach has been proposed for optimizing a continuous function with an easily-calculable statistical function defined only on discrete grid points, in contrast to the our method [36].

Results

Application for posterior distribution based on a classical Ising model

We demonstrate an application for posterior distribution in effective physical model estimation based on a classical Ising model in two dimensions. The model Hamiltonian of the classical Ising model under magnetic field H is defined by (15) where J_ij is the exchange interactions between the i-th spin and the j-th spin. Here, we consider three types of exchange interactions on the square lattice shown in Fig 1(a). In this case, three different model parameters are to be estimated, that is, x = (x₁, x₂, x₃) = (J₁, J₂, J₃).

Download:

Fig 1.

(a) Lattice and types of exchange interactions considered in the classical Ising model defined by Eq (15). (b) Inputted magnetization curve {m^ex(H_l)}_l=1,…,L with L = 200 where (x₁, x₂, x₃) = (−1.0, −0.5, 0.3) are used for a temperature T = 3.0.

https://doi.org/10.1371/journal.pone.0193785.g001

To discuss the efficiency of the proposed method for the effective model estimation, a synthesis magnetization curve {m^ex(H_l)}_l=1,…,L is used as the input data generated by the same model of Eq (15). By performing mean-field calculations for the four sublattice model, the magnetic field dependency of the magnetization is calculated with (x₁, x₂, x₃) = (−1.0, −0.5, 0.3) for a temperature T = 3.0. Here, the Boltzmann constant is set to unity and the physical energy unit is set to |J₁|. Gaussian noise with a mean of zero and a standard deviation of 0.004 is added to the obtained magnetization curve. Fig 1(b) shows the inputted magnetization curve {m^ex(H_l)}_l=1,…,L where the number of data points is L = 200.

To estimate the effective model from {m^ex(H_l)}_l=1,…,L, we search the maximizer of the posterior distribution, which is defined as (16) (17) where {m^cal(H_l, x)}_l=1,…,L is the set of calculated magnetization curves from . In this demonstration, the mean-field calculations for the four sublattice model are used as the inner loop calculation method to obtain {m^cal(H_l, x)}_l=1,…,L. Furthermore, instead of treating the posterior distribution itself, the minimizer of E_C(x) is searched. For simplicity, the prior distribution of model parameters P(x) is assumed to be a uniform distribution; that is, P(x) = 1 which corresponds to the least square fitting, and then the factor 1/2σ² is set to be a constant without loss of generality.

The minimum values of E_C(x) obtained by the random search method, the steepest descent method, the Monte Carlo method, and the our Bayesian optimization are compared, depending on the number of sampling points N_s on E_C(x). The details of each method are denoted below.

Random search method.

A set of model parameters x_n = (x₁, x₂, x₃) is randomly generated from the region where −5 ≤ x₁, x₂, x₃ ≤ 5. Then E_C(x_n) is calculated. This procedure is repeated N_s times, and the data set is obtained, from which the minimum value of E_C(x) is searched.

Steepest descent method.

An initial set of model parameters [i.e. x₁ = (x₁, x₂, x₃)] is randomly generated from the region where −5 ≤ x₁, x₂, x₃ ≤ 5. A set of model parameters is updated N_s/2 times by using the following equation from x_n = (x₁, …, x_k, …, x_K) to : (18) (19) Here, k is randomly chosen from k ∈ 1, …, K where K = 3 in this case, and Δx = α = 0.01. Notice that the calculation of E_C(x) should be repeated twice in each update. Thus, when the number of updates is N_s/2, the number of sampling points on E_C(x) becomes N_s. Using this update of the model parameters, E_C(x) decreases for each update. From the obtained , the minimum value of E_C(x) is searched.

Monte Carlo method.

An initial set of model parameters [i.e. x₁ = (x₁, x₂, x₃)] is randomly generated from the region where −5 ≤ x₁, x₂, x₃ ≤ 5. A set of model parameters is updated N_s times using the following Metropolis-type transition probability from x_n to x_n+1: (20) (21) Here, the set of model parameters after updating is prepared as with from the set of model parameters before updating x_n = (x₁, …, x_k, …, x_K), where k is randomly chosen from k ∈ 1, …, K, and r is a random number between −1 and +1. From the obtained , the minimum value of E_C(x) is searched.

Bayesian optimization.

A set of model parameters x_n is randomly generated from the region where −5 ≤ x₁, x₂, x₃ ≤ 5 and E_C(x_n) is calculated. This procedure is repeated P = 200 times as the initial data set, and the Bayesian optimization is performed with Q = 10 and R = (N_s − P)/Q. In the method, the steepest descent method in Step 3 is implemented by using the following equation from x = (x₁, …, x_k, …, x_K) to : (22) (23) where f(x) expresses the acquisition functions defined by Eqs (12), (13) and (14). Here, k is randomly chosen from k ∈ 1, …, K, and Δx = α = 0.01. f(x) is defined by Eq (12), which is obtained from Gaussian process. In our calculation, the steepest descent method is performed with 100 updates to obtain the extreme value of f(x). From the obtained , the minimum value of E_C(x) is searched.

Fig 2(a) is the sampling number N_s dependence of the averaged minimum value E_av of E_C(x) for 100 independent runs with each methods. The error bars are calculated from the standard deviation. The Bayesian optimization yields the smallest E_av, indicating that the Bayesian optimization gives better minimizers of E_C(x) even if N_s is small. Furthermore, the most successful analysis is given by the Bayesian optimization using f(x)_LCB with κ = 20, while the steepest descent method and the Monte Carlo method produce worse results than the random search method. These methods are frequently trapped at a local minimum depending on the initial set of model parameters, and eventually E_av stays at large values.

Download:

Fig 2. Results of the average E_av of the minimum values of E_C(x) obtained from 100 independent runs in the effective model estimation of the classical Ising model.

(a) E_av as a function of N_s, which is the number of sampling points on E_C(x), obtained from the random search method (red circles), the steepest descent method (yellow circles), the Monte Carlo method (green circles), and the Bayesian optimization (blue circles). (b) E_av as a function of N_s obtained from the random search method (RS) (red circles), the Bayesian optimization using f_LCB(x) with κ = 20 (BO) (blue circles), the random search method with the steepest descent method (RS+SD) (red diamonds), and the Bayesian optimization with the steepest descent method (BO+SD) (Blue diamonds). Dashed lines connect the initial E_av (circle point) by only RS or BO and the obtained E_av (diamond point) by performing the steepest descent method with 50 updates after RS or BO.

https://doi.org/10.1371/journal.pone.0193785.g002

Fig 3(a) is the distribution of the estimated model parameters for 100 independent runs with various N_s by the random search method and the Bayesian optimization. The black lines indicate exact solutions by which the input magnetization curve without Gaussian noise is generated, except for the case where any one of the parameters x_k has zero. As N_s increases, the results by the Bayesian optimization converge on the black lines, implying that the model parameters can be correctly estimated with a high probability. On the other hand, the case of the random search method shows no significant improvement with increasing N_s. This could be understood by noticing that the accuracy of the acquisition functions by Gaussian processes in the Bayesian optimization is improved with increasing the sampling points, namely N_s, while the random search method does not refer to the prior sampling points.

Download:

Fig 3. Results of the estimated model parameters in the effective model estimation based on the classical Ising model.

(a) Distribution of the estimated model parameters from 100 independent runs depending on N_s by the random search method (RS) (red circles) and the Bayesian optimization using f_LCB(x) with κ = 20 (BO) (blue circles). The black lines indicate exact solutions when the input magnetization curve without Gaussian noise is obtained. (b) Distribution of the estimated model parameters by the random search method with the steepest descent method (RS+SD) (red diamonds) and the Bayesian optimization with the steepest descent method (BO+SD) (blue diamonds). In these cases, starting from the results shown in (a) by RS and BO, the steepest descent method is further performed with 50 updates.

https://doi.org/10.1371/journal.pone.0193785.g003

The Bayesian optimization as well as the random search method, in general, does not take into account local structure of the energy function such as gradient in the parameter space. To improve the solutions, we consider combinations of the steepest descent method with the random search method or the Bayesian optimization. One may expect that the steepest descent method produces a local minimum or a global minimum around the estimated model parameters by the random search method or the Bayesian optimization. That is, the estimated model parameters by the random search method or the Bayesian optimization are used as the initial set of model parameters in the steepest descent method, which is performed with 50 updates. Fig 2(b) compares E_av’s by the random search method, the Bayesian optimization using f_LCB(x) with κ = 20, and those with the steepest descent method. The drastic improvement can be confirmed even for 50 updates in the steepest descent method. Note that if the number of updates in the steepest descent method is increased, the obtained E_av should be improved. However, since the number of sampling is also increased, a trade-off between search for initial sets by the random search method or the Bayesian optimization and evaluation of local structures by the steepest descent method should be optimized. We confirmed for some cases that the Bayesian optimization with steepest descent method is the best among the considered methods.

Fig 3(b) shows the distribution of the estimated model parameters. For the Bayesian optimization with the steepest descent method, the real minimizer of E_C(x) is found in all independent runs, while some of the obtained results by the random search method with the steepest descent method differ from the exact solutions, and these cases are trapped in local minima. The steepest descent method significantly improves the estimates by the Bayesian optimization and random search methods. The results imply that the Bayesian optimization combined with the steepest descent method is powerful tool to find the global minimum of E_C(x).

Application for posterior distribution based on a quantum Heisenberg model

The case where the number of model parameters increases against the previous case is considered when a quantum Heisenberg model on the one-dimensional chain is used (Fig 4(a)). The model Hamiltonian of the quantum Heisenberg model under magnetic field H is defined by (24) where Δ is the parameter for the anisotropy and is the Pauli matrix. Here, the model parameters are x = (x₁, x₂, x₃, x₄, x₅) = (J₁, J₂, J₃, Δ, H). Fig 4(a) depicts three types of exchange interactions.

Download:

Fig 4.

(a) Lattice and types of exchange interactions considered in the quantum Heisenberg model defined by Eq (24). (b) Inputted specific heat result {C^ex(T_l)}_l=1,…,L with L = 200 and (x₁, x₂, x₃, x₄, x₅) = (1.0, 0.8, −0.2, −0.7, 0.3).

https://doi.org/10.1371/journal.pone.0193785.g004

This demonstration uses the temperature dependence of the specific heat as an input data. The input specific heat {C^ex(T_l)}_l=1,…,L is generated from the model defined by Eq (24) as follows. By performing the exact diagonalization method, the temperature dependence of the thermal average of the specific heat for (x₁, x₂, x₃, x₄, x₅) = (1.0, 0.8, −0.2, −0.7, 0.3) is calculated. The Gaussian noise with a mean of zero and a standard deviation of 0.004 is added to the obtained specific heat. Fig 4(b) shows the temperature dependence of the specific heat with L = 200, which is used as the input in the effective model estimation. As shown in the previous case, our task is to search for the minimizer of energy function E_Q(x) defined as (25) where {C^cal(T_l)}_l=1,…,L is the set of calculated specific heat from by performing the exact diagonalization method.

We compared E_av, which is the average of the minimum value of E_Q(x) for 100 independent runs, for the random search method, the steepest descent method, the Monte Carlo method, and the Bayesian optimization (Fig 5(a)). The setups of these methods are the same as the previous case except for the number of model parameters (K = 5) and the region in which a set of model parameters is randomly generated. In this case, we use −3 ≤ x₁, x₂, x₃ ≤ 3 and −2 ≤ x₄, x₅ ≤ 2. The results are qualitatively the same as the previous case. The most successful analysis is produced by the Bayesian optimization using f(x)_EI. This result is different from the previous demonstration, which means that an appropriate acquisition function depends on a target physical model and input physical quantities. Furthermore, as shown in Fig 5(b), the combined steepest descent method improves the estimates of the Bayesian optimization and the random search method again. Similar to the previous case, the Bayesian optimization with the steepest descent method gives a better minimizer of E_Q(x). Consequently, we conclude that the Bayesian optimization is useful to find a better maximizer of the posterior distribution in an effective model estimation with a small number of sampling points.

Download:

Fig 5. Results of the average E_av of the minimum values of E_Q(x) obtained from 100 independent runs in the effective model estimation of the quantum Heisenberg model.

(a) E_av as a function of N_s obtained from the random search method (red circles), the steepest descent method (yellow circles), the Monte Carlo method (green circles), and the Bayesian optimization (blue circles). (b) E_av as a function of N_s obtained from the random search method (RS) (red circles), the Bayesian optimization using f_EI(x) (BO) (blue circles), the random search method with the steepest descent method (RS+SD) (red diamonds), and the Bayesian optimization with the steepest descent method (BO+SD) (blue diamonds). In the steepest descent method, 50 updates are performed after RS or BO.

https://doi.org/10.1371/journal.pone.0193785.g005

Discussion

We searched for a better maximizer of a posterior distribution in the effective physical model estimation which is a computationally extensive probability distribution, using the Bayesian optimization. It is found for at least two simple models that the Bayesian optimization has a higher efficiency of finding a better maximizer of the posterior distribution compared to the random search method, the steepest descent method, and the Monte Carlo method when the number of sampling points on the posterior distribution is fixed to be small, while an appropriate acquisition function providing a high efficiency still depends on the problem to be solved. Our Bayesian optimization has some hyperparameters, i.e., P, Q, and R. Although we did not optimize these hyperparameters, the Bayesian optimization is a better method to obtain the maximizer of the posterior distribution. Particularly, since the value of Q is related to the batch/parallel problem of the Bayesian optimization [37, 38], some improvement of the performance is expected by tuning Q. Furthermore, a combination of the Bayesian optimization and the steepest descent method drastically increases the efficiency of finding a better maximizer of the posterior distribution. The key of our Bayesian optimization is to predict a set of model parameters near a local maximum or a global maximum of the posterior distribution from the extreme values of acquisition functions by Gaussian processes, which requires a relatively low computational cost. Consequently, the model parameters near a global maximum can be found with a high probability. These facts suggest that the Bayesian optimization will be a powerful tool for effective model estimations. However, to find a maximizer of posterior distributions with various types of prior distributions and a large number of model parameters, the Bayesian optimization may be not always useful. Then in the future, we will evaluate effective model estimations using the Bayesian optimization for actual materials. Because the maximizer of a probability distribution is searched in many scientific fields, the Bayesian optimization will play an important role in the promotion of science.

Acknowledgments

We thank Shu Tanaka for the useful discussions. R. T. was partially supported by the Nippon Sheet Glass Foundation for Materials Science and Engineering. K. H. was partially supported by a Grants-in-Aid for Scientific Research from JSPS, Japan (Grant No. 25120010 and 25610102). The computations in the present work were performed on Numerical Materials Simulator at NIMS, and the supercomputer at Supercomputer Center, Institute for Solid State Physics, The University of Tokyo. This work was done as part of the “Materials Research by Information Integration” Initiative of the Support Program for Starting Up Innovation Hub, Japan Science and Technology Agency.

References

1. Mockus J. Bayesian approach to global optimization: Theory and applications. Springer; 1989.
2. Jones DR, Schonlau M, Welch WJ. Efficient global optimization of expensive black-box functions. J Global Optim 1998; 13: 455–492.
- View Article
- Google Scholar
3. Pelikan M, Goldberg DE, Cantú-Paz E. BOA: the Bayesian optimization algorithm. GECCO’99 Proceedings of the 1st Annual Conference on Genetic and Evolutionary Computation 1999; 525–532.
4. Snoek J, Larochelle H, Adams RP. Practical Bayesian optimization of machine learning algorithms. Advances in Neural Information Processing Systems 25; 2012.
5. Ueno T, Rhone TD, Hou Z, Mizoguchi T, Tsuda K. COMBO: An efficient Bayesian optimization library for materials science. Materials Discovery 2016; 4: 18–21.
- View Article
- Google Scholar
6. Seko A, Maekawa T, Tsuda K, Tanaka I. Machine learning with systematic density-functional theory calculations: Application to melting temperatures of single- and binary-component solids. Phys Rev B 2014; 89: 054303-1-9.
- View Article
- Google Scholar
7. Toyoura K, Hirano D, Seko A, Shiga M, Kuwabara A, Karasuyama M, et al. Machine-learning-based selective sampling procedure for identifying the low-energy region in a potential energy surface: A case study on proton conduction in oxides. Phys Rev B 2016; 93: 054112-1-11.
- View Article
- Google Scholar
8. Kiyohara S, Oda H, Tsuda K, Mizoguchi T. Acceleration of stable interface structure searching using a kriging approach. Jpn J Appl Phys 2016; 55: 045502-1-4.
- View Article
- Google Scholar
9. Balachandran PV, Xue D, Theiler J, Hogden J, Lookman T. Adaptive strategies for materials design using uncertainties. Sci Rep 2016; 6: 19660. pmid:26792532
- View Article
- PubMed/NCBI
- Google Scholar
10. Ju S, Shiga T, Feng L, Hou Z, Tsuda K, Shiomi J. Designing nanostructures for phonon transport via Bayesian optimization. Phys Rev X 2017; 7: 021024-1-10.
- View Article
- Google Scholar
11. Packwood DM, Hitosugi T. Rapid prediction of molecule arrangements on metal surfaces via Bayesian optimization. Appl Phys Express 2017; 10: 065502-1-4.
- View Article
- Google Scholar
12. Seko A, Hayashi H, Nakayama K, Takahashi A, Tanaka I. Representation of compounds for machine-learning prediction of physical properties. Phys Rev B 2017; 95: 144110-1-11.
- View Article
- Google Scholar
13. Kennedy MC, O’Hagan A. Bayesian calibration of computer models. J Roy Stat Soc B. 2001; 63: 425–464.
- View Article
- Google Scholar
14. Higdon D, Kennedy M, Cavendish JC, Cafeo JA, Ryne RD. Combining field data and computer simulations for calibration and prediction. SIAM J Sci Comput 2004; 26: 448–466.
- View Article
- Google Scholar
15. Liu F, Bayarri MJ, Berger JO. Modularization in Bayesian analysis, with emphasis on analysis of computer models. Bayesian Anal 2009; 4: 119–150.
- View Article
- Google Scholar
16. Tamura R, Hukushima K. Method for estimating spin-spin interactions from magnetization curves. Phys Rev B 2017; 95: 064407-1-8.
- View Article
- Google Scholar
17. Sandvik AW, Kurkijärvi J. Quantum Monte Carlo simulation method for spin systems. Phys Rev B 1991; 43: 5950–5961.
- View Article
- Google Scholar
18. Wang F, Landau DP. Efficient, multiple-range random walk algorithm to calculate the density of states. Phys Rev Lett 2001; 86: 2050–2053. pmid:11289852
- View Article
- PubMed/NCBI
- Google Scholar
19. Kawashima N, Harada K. Recent developments of world-line Monte Carlo methods. J Phys Soc Jpn 2004; 73: 1379–1414.
- View Article
- Google Scholar
20. Suwa H, Todo S. Markov chain Monte Carlo method without detailed balance. Phys Rev Lett 2010; 105: 120603-1-4. pmid:20867621
- View Article
- PubMed/NCBI
- Google Scholar
21. Landau DP, Binder K. A guide to Monte Carlo simulations in statistical physics. Cambridge University Press; 2014.
22. Lin HQ. Exact diagonalization of quantum-spin models. Phys Rev B 1990; 42: 6561–6567.
- View Article
- Google Scholar
23. Jaklič J, Prelovšek P. Lanczos method for the calculation of finite-temperature quantities in correlated systems. Phys Rev B 1994; 49: 5065–5068.
- View Article
- Google Scholar
24. Yamaji Y, Nomura Y, Kurita M, Arita R, Imada M. First-principles study of the honeycomb-lattice iridates Na₂IrO₃ in the presence of strong spin-orbit interaction and electron correlations. Phys Rev Lett 2014; 113: 107201-1-5. pmid:25238380
- View Article
- PubMed/NCBI
- Google Scholar
25. White SR. Density matrix formulation for quantum renormalization groups. Phys Rev Lett 1992; 69: 2863.
- View Article
- Google Scholar
26. Nishino T. Density matrix renormalization group method for 2D classical models. J Phys Soc Jpn 1995; 64: 3598–3601.
- View Article
- Google Scholar
27. Nishino T, Okunishi K. Corner transfer matrix renormalization group method. J Phys Soc Jpn 1996; 65: 891–894.
- View Article
- Google Scholar
28. Hukushima K, Nemoto K. Exchange Monte Carlo method and application to spin glass simulations. J Phys Soc Jpn 1996; 65: 1604–1608.
- View Article
- Google Scholar
29. Bishop C. Pattern recognition and machine learning. Springer-Verlag New York; 2006.
30. Rahimi A, Recht B. Random features for large-scale kernel machines. Advances in Neural Information Processing Systems 20; 2007.
31. Heaton MJ, Datta A, Finley A, Furrer R, Guhaniyogi R, Gerber F, et al. Methods for analyzing large spatial data: A review and comparison; 2017. Preprint. Available from: arXiv:1710.05013. Cited 15 January 2018.
32. Lai TL, Robbins H. Asymptotically efficient adaptive allocation rules. Adv Appl Math 1985; 6; 4–22.
- View Article
- Google Scholar
33. Benassi R, Bect J, Vazquez E. Robust Gaussian process-based global optimization using a fully Bayesian expected improvement criterion. In: Coello Coello CA, editor. Learning and Intelligent Optimization, vol. 6683; 2011. pp. 176–190.
34. Srinivas N, Krause A, Kakade SM, Seeger MW. Information-theoretic regret bounds for gaussian process optimization in the bandit setting. IEEE Trans Inf Theory 2012; 58: 3250–3265.
- View Article
- Google Scholar
35. Shahriari B, Swersky K, Wang Z, Adams RP, de Freitas N. Taking the human out of the loop: A review of Bayesian optimization, Proceedings of the IEEE 2016; 104: 148–175.
- View Article
- Google Scholar
36. Cox DD, John S. A statistical method for global optimization. IEEE International Conference on Systems, Man, and Cybernetics 1992; 1241–1246.
37. Chevalier C, Ginsbourger D. Fast computation of the multi-points expected improvement with applications in batch selection. 2012; hal-00732512v2.
38. Desautels T, Krause A, Burdick JW. Parallelizing exploration-exploitation tradeoffs in Gaussian process bandit optimization. J Mach Learn Res 2014; 15: 4053–4103.
- View Article
- Google Scholar

[ref1] 1. Mockus J. Bayesian approach to global optimization: Theory and applications. Springer; 1989.

[ref2] 2. Jones DR, Schonlau M, Welch WJ. Efficient global optimization of expensive black-box functions. J Global Optim 1998; 13: 455–492.
View Article
Google Scholar

[3] View Article

[4] Google Scholar

[ref3] 3. Pelikan M, Goldberg DE, Cantú-Paz E. BOA: the Bayesian optimization algorithm. GECCO’99 Proceedings of the 1st Annual Conference on Genetic and Evolutionary Computation 1999; 525–532.

[ref4] 4. Snoek J, Larochelle H, Adams RP. Practical Bayesian optimization of machine learning algorithms. Advances in Neural Information Processing Systems 25; 2012.

[ref5] 5. Ueno T, Rhone TD, Hou Z, Mizoguchi T, Tsuda K. COMBO: An efficient Bayesian optimization library for materials science. Materials Discovery 2016; 4: 18–21.
View Article
Google Scholar

[8] View Article

[9] Google Scholar

[ref6] 6. Seko A, Maekawa T, Tsuda K, Tanaka I. Machine learning with systematic density-functional theory calculations: Application to melting temperatures of single- and binary-component solids. Phys Rev B 2014; 89: 054303-1-9.
View Article
Google Scholar

[11] View Article

[12] Google Scholar

[ref7] 7. Toyoura K, Hirano D, Seko A, Shiga M, Kuwabara A, Karasuyama M, et al. Machine-learning-based selective sampling procedure for identifying the low-energy region in a potential energy surface: A case study on proton conduction in oxides. Phys Rev B 2016; 93: 054112-1-11.
View Article
Google Scholar

[14] View Article

[15] Google Scholar

[ref8] 8. Kiyohara S, Oda H, Tsuda K, Mizoguchi T. Acceleration of stable interface structure searching using a kriging approach. Jpn J Appl Phys 2016; 55: 045502-1-4.
View Article
Google Scholar

[17] View Article

[18] Google Scholar

[ref9] 9. Balachandran PV, Xue D, Theiler J, Hogden J, Lookman T. Adaptive strategies for materials design using uncertainties. Sci Rep 2016; 6: 19660. pmid:26792532
View Article
PubMed/NCBI
Google Scholar

[20] View Article

[21] PubMed/NCBI

[22] Google Scholar

[ref10] 10. Ju S, Shiga T, Feng L, Hou Z, Tsuda K, Shiomi J. Designing nanostructures for phonon transport via Bayesian optimization. Phys Rev X 2017; 7: 021024-1-10.
View Article
Google Scholar

[24] View Article

[25] Google Scholar

[ref11] 11. Packwood DM, Hitosugi T. Rapid prediction of molecule arrangements on metal surfaces via Bayesian optimization. Appl Phys Express 2017; 10: 065502-1-4.
View Article
Google Scholar

[27] View Article

[28] Google Scholar

[ref12] 12. Seko A, Hayashi H, Nakayama K, Takahashi A, Tanaka I. Representation of compounds for machine-learning prediction of physical properties. Phys Rev B 2017; 95: 144110-1-11.
View Article
Google Scholar

[30] View Article

[31] Google Scholar

[ref13] 13. Kennedy MC, O’Hagan A. Bayesian calibration of computer models. J Roy Stat Soc B. 2001; 63: 425–464.
View Article
Google Scholar

[33] View Article

[34] Google Scholar

[ref14] 14. Higdon D, Kennedy M, Cavendish JC, Cafeo JA, Ryne RD. Combining field data and computer simulations for calibration and prediction. SIAM J Sci Comput 2004; 26: 448–466.
View Article
Google Scholar

[36] View Article

[37] Google Scholar

[ref15] 15. Liu F, Bayarri MJ, Berger JO. Modularization in Bayesian analysis, with emphasis on analysis of computer models. Bayesian Anal 2009; 4: 119–150.
View Article
Google Scholar

[39] View Article

[40] Google Scholar

[ref16] 16. Tamura R, Hukushima K. Method for estimating spin-spin interactions from magnetization curves. Phys Rev B 2017; 95: 064407-1-8.
View Article
Google Scholar

[42] View Article

[43] Google Scholar

[ref17] 17. Sandvik AW, Kurkijärvi J. Quantum Monte Carlo simulation method for spin systems. Phys Rev B 1991; 43: 5950–5961.
View Article
Google Scholar

[45] View Article

[46] Google Scholar

[ref18] 18. Wang F, Landau DP. Efficient, multiple-range random walk algorithm to calculate the density of states. Phys Rev Lett 2001; 86: 2050–2053. pmid:11289852
View Article
PubMed/NCBI
Google Scholar

[48] View Article

[49] PubMed/NCBI

[50] Google Scholar

[ref19] 19. Kawashima N, Harada K. Recent developments of world-line Monte Carlo methods. J Phys Soc Jpn 2004; 73: 1379–1414.
View Article
Google Scholar

[52] View Article

[53] Google Scholar

[ref20] 20. Suwa H, Todo S. Markov chain Monte Carlo method without detailed balance. Phys Rev Lett 2010; 105: 120603-1-4. pmid:20867621
View Article
PubMed/NCBI
Google Scholar

[55] View Article

[56] PubMed/NCBI

[57] Google Scholar

[ref21] 21. Landau DP, Binder K. A guide to Monte Carlo simulations in statistical physics. Cambridge University Press; 2014.

[ref22] 22. Lin HQ. Exact diagonalization of quantum-spin models. Phys Rev B 1990; 42: 6561–6567.
View Article
Google Scholar

[60] View Article

[61] Google Scholar

[ref23] 23. Jaklič J, Prelovšek P. Lanczos method for the calculation of finite-temperature quantities in correlated systems. Phys Rev B 1994; 49: 5065–5068.
View Article
Google Scholar

[63] View Article

[64] Google Scholar

[ref24] 24. Yamaji Y, Nomura Y, Kurita M, Arita R, Imada M. First-principles study of the honeycomb-lattice iridates Na₂IrO₃ in the presence of strong spin-orbit interaction and electron correlations. Phys Rev Lett 2014; 113: 107201-1-5. pmid:25238380
View Article
PubMed/NCBI
Google Scholar

[66] View Article

[67] PubMed/NCBI

[68] Google Scholar

[ref25] 25. White SR. Density matrix formulation for quantum renormalization groups. Phys Rev Lett 1992; 69: 2863.
View Article
Google Scholar

[70] View Article

[71] Google Scholar

[ref26] 26. Nishino T. Density matrix renormalization group method for 2D classical models. J Phys Soc Jpn 1995; 64: 3598–3601.
View Article
Google Scholar

[73] View Article

[74] Google Scholar

[ref27] 27. Nishino T, Okunishi K. Corner transfer matrix renormalization group method. J Phys Soc Jpn 1996; 65: 891–894.
View Article
Google Scholar

[76] View Article

[77] Google Scholar

[ref28] 28. Hukushima K, Nemoto K. Exchange Monte Carlo method and application to spin glass simulations. J Phys Soc Jpn 1996; 65: 1604–1608.
View Article
Google Scholar

[79] View Article

[80] Google Scholar

[ref29] 29. Bishop C. Pattern recognition and machine learning. Springer-Verlag New York; 2006.

[ref30] 30. Rahimi A, Recht B. Random features for large-scale kernel machines. Advances in Neural Information Processing Systems 20; 2007.

[ref31] 31. Heaton MJ, Datta A, Finley A, Furrer R, Guhaniyogi R, Gerber F, et al. Methods for analyzing large spatial data: A review and comparison; 2017. Preprint. Available from: arXiv:1710.05013. Cited 15 January 2018.

[ref32] 32. Lai TL, Robbins H. Asymptotically efficient adaptive allocation rules. Adv Appl Math 1985; 6; 4–22.
View Article
Google Scholar

[85] View Article

[86] Google Scholar

[ref33] 33. Benassi R, Bect J, Vazquez E. Robust Gaussian process-based global optimization using a fully Bayesian expected improvement criterion. In: Coello Coello CA, editor. Learning and Intelligent Optimization, vol. 6683; 2011. pp. 176–190.

[ref34] 34. Srinivas N, Krause A, Kakade SM, Seeger MW. Information-theoretic regret bounds for gaussian process optimization in the bandit setting. IEEE Trans Inf Theory 2012; 58: 3250–3265.
View Article
Google Scholar

[89] View Article

[90] Google Scholar

[ref35] 35. Shahriari B, Swersky K, Wang Z, Adams RP, de Freitas N. Taking the human out of the loop: A review of Bayesian optimization, Proceedings of the IEEE 2016; 104: 148–175.
View Article
Google Scholar

[92] View Article

[93] Google Scholar

[ref36] 36. Cox DD, John S. A statistical method for global optimization. IEEE International Conference on Systems, Man, and Cybernetics 1992; 1241–1246.

[ref37] 37. Chevalier C, Ginsbourger D. Fast computation of the multi-points expected improvement with applications in batch selection. 2012; hal-00732512v2.

[ref38] 38. Desautels T, Krause A, Burdick JW. Parallelizing exploration-exploitation tradeoffs in Gaussian process bandit optimization. J Mach Learn Res 2014; 15: 4053–4103.
View Article
Google Scholar

[97] View Article

[98] Google Scholar

Figures

Abstract

Introduction

Bayesian optimization

Gaussian process

Bayesian optimization for computationally extensive probability distributions

Results

Application for posterior distribution based on a classical Ising model

Random search method.

Steepest descent method.

Monte Carlo method.

Bayesian optimization.

Application for posterior distribution based on a quantum Heisenberg model

Discussion

Acknowledgments

References