Data-informed deep optimization

Motivated by the impressive success of deep learning in a wide range of scientific and industrial applications, we explore in this work the application of deep learning into a specific class of optimization problems lacking explicit formulas for both objective function and constraints. Such optimization problems exist in many design problems, e.g., rotor profile design, in which objective and constraint values are available only through experiment or simulation. They are especially challenging when design parameters are high-dimensional due to the curse of dimensionality. In this work, we propose a data-informed deep optimization (DiDo) approach emphasizing on the adaptive fitting of the the feasible region as follows. First, we propose a deep neural network (DNN) based adaptive fitting approach to learn an accurate DNN classifier of the feasible region. Second, we use the DNN classifier to efficiently sample feasible points and train a DNN surrogate of the objective function. Finally, we find optimal points of the DNN surrogate optimization problem by gradient descent. To demonstrate the effectiveness of our DiDo approach, we consider a practical design case in industry, in which our approach yields good solutions using limited size of training data. We further use a 100-dimension toy example to show the effectiveness of our approach for higher dimensional problems. Our results indicate that, by properly dealing with the difficulty in fitting the feasible region, a DNN-based method like our DiDo approach is flexible and promising for solving high-dimensional design problems with implicit objective and constraints.


Introduction
With the development of technology, we are able to study a scientific or industrial problem through massive amount of data [7], for example, in neuroscience, modeling a biological neuronal network to meet a set of biological requirements on its dynamical performance [28]; in industry, optimizing a large set of design parameters to maximize the machine performance while satisfying physical constraints [12,15].For such practical systems, the dependence between the model or design parameters and the corresponding performance often has no explicit formula [11,20].Moreover, constraints in these problems may also be very complex with no explicit formulas.And whether a large set of design parameters is compatible to the constraints can only be examined through experiments or simulations.Therefore, we often encounter an optimization problem in which both the objective function and constraints are unknown and are available only on a set of data points.For convenience, we call such optimization problems datainformed optimization problem.Contrary to traditional optimization problems, which are often low-dimensional, can be analytically described and solved by many well-developed algorithms [4,9], it is increasingly important to develop tractable approaches for highdimensional data-informed optimization problems.
A popular method to solve data-informed optimization problem is to use surrogate models to fit the objective and constraint functions.However, due to the curse of dimensionality for high dimensional problems, it is often difficult to obtain a good surrogate model by conventional methods, like polynomial fitting.Empirical and theoretical studies suggest that the DNN approach, trained by gradient-based algorithms, can overcome the curse of dimensionality in fitting high-dimensional functions [14,21].It has also been observed in practice that the DNNs in general do not overfit even in an overparameterized setting without explicit regularizations [26].A series of studies provide possible mechanisms underlying the non-overfitting puzzle of DNNs.For example, frequency principle, both in experiments and theory [18,24,25,27], shows that DNNs prefer to fit training data with low-frequency functions, which often leads to a good generalization performance due to the low frequency dominance in real data.Therefore, DNN serves as an appropriate surrogate model to learn high-dimensional objective and constraint functions from finite samples.We then use a first-order gradient descent to solve the optimization problem.Although this DNN-surrogate optimization problem is nonconvex [10], we empirically find that the gradient descent can obtain satisfying solutions.We call this DNN-surrogate approach for data-informed optimization problem by datainformed deep optimization (DiDo).Due to the well-developed program framework, such as Tensorflow and PyTorch [1,16,17] and computation hardwares, such as GPU, the DiDo approach can be easily and efficiently implemented.
The detail of DiDo is briefed as follows.For constraints, since neither the explicit formula nor the number of constraints are completely available, we train a DNN classifier by neural network to learn the feasible region, that is, classifying whether each set of variables satisfy all constraints or not.To make the DNN classifier more accurate, the training data for the classifier is sampled using an iterative sampling method, that is, adding the data near the boundary of the classifier into training set by Langevin Monte Carlo (LMC) to re-train the classifier.To improve the efficiency of sampling high-dimensional data within the feasible region for fitting of the objective function by DNN, we sample the training set similarly by LMC based on the well-trained classifier.Finally, we find candi-dates of optimal parameters of the DNN-surrogate optimization problem by traditional algorithm, for example, gradient descent.
For demonstration, we first consider a specific problem in industry, which is to find the 6-dimensional design parameters for the rotor profile of double screw compressor to maximize the actual flow.Without the need of carefully adjusting hyper-parameters, the best actual flow found by our DiDo approach is much better than the result obtained by original hand-craft approach.To illustrate the effectiveness of DiDo approach in higher dimensional problems, we consider a 100-dimension toy example.The optimal value found with our approach is very close to the true optimal value.These results indicate that our DiDo approach can indeed solve high-dimensional data-informed optimization problems.
The rest of paper is organized as follows.Initially, we give a brief preliminary about the notation, DNN and LMC.Followed by the main contents of this paper, our datainformed deep optimization approach, that is, using a deep-based method to solve a type of optimization problem which is different from the traditional ones.Then for demonstrating our DiDo approach, a practical design case in industry and a 100-dimensional toy example are shown in detail.Finally, we make a conclusion and discuss the future work.

Notation
In this paper, we use the following notations, see Table 1.
DNN surrogate model for objective function DNN classifier neural network for feasible region

DNN
The general setup for a DNN is reviewed as follows.A fully connected DNN of H layers is denoted by [1] σ•(W [0] σ•+b [0] )+b [1] )•••)+b [H−2] )+b [H−1] , σ is the activation function and "•" means entry-wise operation.The set of parameters for DNN is denoted by θ= (W [0] ,W [1] For the regression problem of fitting a training set {(x i ,y i )} n i=1 , where x i ∈R d and y i ∈R for each i, the commonly used loss functions are mean-square error (MSE), that is, and root-mean-square error (RMSE), that is, We use MSE when training DNN to fit the objective function.and use RMSE to measure the training error and test error of the DNN.
For the classification problem of fitting a training set {(x i ,q i )} n i=1 , where x i ∈ R d and q i ∈ {0,1} for each i, the loss function we used is binary-cross-entropy (BCE), that is, In cases given in this paper, the activation function is fixed to GELU function.GELU is a smooth non-saturating activation function that can alleviate gradient vanishing.Empirically, a GELU DNN is efficient to train and generalizes well for smooth problems we considered.Note that one can also consider other smooth non-saturating activation functions like Swish, ELU and SELU to achieve similar training and generalization performance.
During the training of neural network, the parameters of the DNN in each epoch are updated by a gradient-based optimization algorithm, e.g.gradient descent (GD), stochastic gradient descent (SGD) or Adam.To speed up the training process, we update the parameters of DNN using Adam [13].

Langevin Monte Carlo (LMC)
There are many mature methods to sample data from a desired probability distribution, such as Markov chain Monte Carlo, Metropolis-Hastings, Hamiltonian Monto Carlo and Split Monte Carlo.For convenience, in our experiments, we use overdamped Langevin Monte Carlo (LMC).
LMC is a common method to sample data following a Boltzmann distribution [5,6,8,19].This method is based on evolving a stochastic differential equation (SDE), that is, where β is positive hyperparamter and W is the Brownian motion.The steady-state distribution of this SDE is proportional to e −βE(x) and it satisfies the detailed balance condition.The set of long-time solution of the SDE follows the Boltzmann distribution ∼ e −βE(x) .We use the first-order Euler-Maruyama scheme to solve the SDE, i.e. each data point x is updated according to x t+1 = x t −α∇E(x t )+ 2α β ξ t , where ξ t ∼ N(0 d , I d ).Note that we choose appropriate energy function E(x) for different tasks.In our experiment, we use E(x) = ( f θ c (x)−0.5) 2 to sample data concentrated around the boundary of the predicted-feasible region for DNN classifier f θ c (x) and use E(x) = ( f θ c (x)−1) 2 to efficiently sample more feasible data, where f θ c (x) denotes the DNN classifier.
A simple version of LMC method is shown in algorithm 1.

Data-informed deep optimization
In this section, we introduce the framework of Data-informed deep optimization (DiDo) approach for solving high-dimensional optimization problems, in which the objective function and the constraints are only available through samples without explicit formula.The DiDo approach shows an indispensable value beyond the tradition optimization approach in high-dimensional data-informed problems.

Data-informed problem formulation
The data-informed optimization problem is formulated as follows.Data-informed optimization problem: where objective function f (x) and feasible region Ω are implicit which can only be evaluated through simulation at certain sampling points.In practice, Ω is often defined by a series of implicit constrains as Ω = {x| f i (x) ≤ 0,i = 1,2,...,L}.

Deep optimization approach
We propose a deep optimization approach to solve the data-informed optimization problem (see Fig. 1 for a flow chart).Our general idea to deal with a problem with implicit objective function and feasible region is to fit them by DNN surrogates from data.Then we can optimize this problem with common gradient-based method.Note that the training of the objective function relies on an explicit and accurate surrogate of feasible region for generating high quality training samples well covering the whole feasible region.Therefore, we first train a DNN classifier f θ c (x) through an iterative process from an initial sample set.Then we use f θ c (x) to generate random samples from feasible region and evaluate the corresponding values of the objective function.Then we build the training data set i=0 by simulation, whose inputs are sampled randomly from feasible region based on f θ c (x).Through fitting D obj , we obtain a DNN f θ o (x) as a surrogate of the objective function f (x).Finally, by optimizing f θ o (x) with surrogate constrains f θ c (x)≥0.5 ( f θ c (x)=0.5 is regarded as surrogate boundary) , we can get candidates of the optimal parameters of the problem (3.1), which should be close to the true optimal parameters of the problem.

Fitting feasible region
Generally, without an explicit feasible region, it is difficult to generate well distributed feasible training samples especially in a high-dimensional problem.With a blind sampling, training samples are likely far from the decision boundary, i.e., boundary of the feasible region, resulting in an inaccurate fitting of the DNN classifier.To overcome this difficulty in our deep optimization approach, we propose an iterative method which adds new samples around the boundary of current DNN classifier and retrain it at each iteration.Using this approach, we can efficiently obtain a good DNN classifier f θ c (x) ∈ [0,1] through several rounds of iteration.
Initially, we uniformly sample X ini I in a selected region B based on the prior knowledge of the considered problem and train the classifier f Empirically balancing the feasible and infeasible points benefits the performance of the classifier.Note that many problems whose optimal parameters close to the boundary of the feasible region require highly accurate DNN classifier (see example in Fig. 6).We propose an iterative method to efficiently improve the accuracy of classifier f For a stopping criterion, it is crucial to determine whether the surrogate boundary is close to the true boundary, e.g., their "mean distance" is smaller than certain tolerance .Intuitively, for any point on the surrogate boundary, if its distance to the true boundary is larger than the , then the prediction accuracy of the DNN classifier in the -neighborhood of this point is roughly 50% (see Fig. 2(a) for illustration); otherwise, if the distance is much smaller than , then the prediction accuracy in the -neighborhood should be close to 1 (see Fig. 2(b) for illustration).Therefore, we sample some points close to the surrogate boundary by LMC method (see algorithm 1 for details) and perturbed them by Gaussian noise of covariance matrix σ 2 I d , where σ is roughly due to concentration in the equator [2].When the predicted accuracy of the classifier on these points is higher than a expected value, say 95%, we stop the iteration.
The detail of our iterative training algorithm is shown in algorithm 2.

Fitting objective function
For a high-dimensional large-scale problem, with implicit boundary, it is difficult to efficiently sample diverse training data.However, with explicit classifier obtained above, we can use LMC with energy function E(x)=( f θ c (x)−1)   The detail of fitting objective function is shown in algorithm 3.

Algorithm 3: Fitting of objective function
Data: classifier f θ c (x); a non-empty feasible set of f θ c (x): X ini S ; a large enough number n t ; energy function 2 Use LMC method with E(x) to sample data following distribution ∼ e −βE(x) and obtain X T S ; 3 Select data in the feasible region of real system Ω: X o = Ω∩X T S ; 4 Obtain training data for the objective function:

Deep optimization
Based on the accurate DNN surrogate models of constraints and objective function obtained above, the data-informed optimization problem (3.1) turns to be the following explicit optimization problem, where 0.5 is the threshold of the DNN classifier f θ c (x) ∈ [0,1] for prediction.The problem (3.2) is a conventional optimization problem with constraints.To solve it, we first rewrite it as an unconstrained problem, making the inequality constraint implicit in the objective min where I − : R → R is the indicator function for the non-positive real number, However, the indicator function I − is not differentiable.We approximate the indicator function I − by a "soft" function.For example, we use the interior-point method.The basic idea of interior-point method is to approximate the indicator function I − (u) by the barrier function and a common barrier function is logarithmic barrier, −( 1 t )log(−u), where t > 0 is a hyperparameter that sets the accuracy of the approximation [3].
Substituting I − (u) with − 1 t log(−u) gives the approximation To solve problem (3.3), we use gradient descent (GD) for convenience.Although simple, we find that GD is often an effective optimization algorithm in DiDo.The deep optimization is concluded in algorithm 4. The proposed methodology gives a schematic process to search for candidates of optimal parameters (see Fig. 1) for high dimensional optimization problem with implicit feasible region and objective function.As we will show, it is well suitable for data-driven inferences using deep neural networks which can efficiently differentiate.
Remark that even when we can analytically characterize the feasible region by a set of equations, we can also train a DNN surrogate to represent the feasible region.In such case, our approach can still bring benefits, for example, using DNN classifier can soft the boundary of the feasible region and we can easily determine the normal vector of the boundary.

Optimal rotor profile
In this section, we apply the DiDo approach to solve an engineering design problem to show its effectiveness.

Problem description
Screw compressor is widely used in refrigeration, mining, petrochemical and other industries because of its high reliability, good power balance, less leakage and high efficiency.As the core component of twin-screw compressor, optimizing the design of rotor profile would vastly benefit the mechanical performance of the screw compressor.The rotor profile is smoothly connected by several arcs and arc envelopes together.Empirically, we can parameterize the rotor profile by 6 parameters, x = [r,r 3 ,r o ,r o2 ,u 1 ,R] ∈ R 6 , where r, r 3 , r o , r o2 , R are radius of the arc and u 1 is an angle [22,23].Then, the optimization of the rotor profile becomes an optimization problem w.r.t. the 6 parameters.
In our example, the performance of a design parameter set, consisting of the 6 design parameters, is measured by the actual flow of the rotor, which is an important performance indicator for large compressor, through computational fluid dynamics simulation.Our goal is to find a rotor profile that can maximize the actual flow.
Remark that not all parameters in R 6 are feasible for the design.They should satisfy a set of implicit constraints related to geometrical properties of the rotor.Therefore, both the objective and the constraint functions are data-informed, i.e., they are only available on a set of data points through simulation.In the following, we demonstrate the effectiveness of our DiDo approach on this problem.

Feasible region learned by a DNN classifier
In this example, we first use the iterative training algorithm in algorithm 2 to train the DNN classifier f θ c (x), which is a fully connected DNN with hidden layer sizes 800-600-400-200 equipped with a sigmoid function at the output layer.Without loss of generality, we choose 0.5 as threshold to determine the surrogate feasible region, i.e., f θ c (x) ≥ 0.5.
Remark that, we carefully choose the initial sample region B, such that the number of feasible points and non-feasible points are balanced in the initial training data.For the effectiveness of DNN training, we normalize each parameter to a mean zero and variance one input variable.
We set initial sample size n 0 = 8000 and we set n 1 = 5000 samples in each iteration.With algorithm 2, we can obtain a well-trained classifier f θ c (x).
To show effectiveness of the iterative training, we show the accuracy of the DNN classifier on the samples at surrogate boundary at each iteration with Gaussian noise perturbation during the iteration.As shown in Fig. 3(a), for each curve, which is the accuracy of the classifier w.r.t.different noise standard deviation, as the perturbation noise increases, the accuracy increases.This indicates that the classifier is more accurate on the samples that deviate more from the boundary, which provides a rationale for our iterative training algorithm focusing on training the boundary.Compared with different iterations, indicated by different colored curves, as the iteration proceeds accompanied by the increasing of training samples, the classifier is improved.For example, as shown in Fig. 3

Objective function learned by a DNN model
We use a DNN surrogate to fit the objective function, i.e. a mapping from a designed rotor profile to the actual flow.By algorithm 3, we use the classifier f θ c (x) obtained above to generate a training set D obj of size n t = 500 and train a GELU-DNN f θ o (x) of hidden layer size 1024-512-256-128.The test accuracy of the DNN f θ o (x) is evaluated on a test data set consisting of 2000 samples.
As shown in Fig. 4, after training, the normalized RMSE training error is ∼ 0.01 whereas the normalized RMSE test error is ∼ 0.04.

Deep optimization
Then we solve the problem with data-informed deep optimization approach in algorithm 4 using f θ o (x) and f θ c (x).
The optimal of this optimization problem may be not unique and there could be multiple local minima.Therefore, we solve the problem by gradient descent with various initial points to search for a global minimum.For visualization, in Fig. 5, we show the distribution of the actual flow of the training samples used for learning DNN surrogate and a set of true feasible candidates of optimal profile parameters.Note that the maximal actual flow of training samples approximates 1256.After solving the optimization problem, we obtain a set of candidates of optimal profile parameters.Then we examine whether those parameters are in true feasible region with simulator and calculate the actual flow on these feasible designed rotor parameters with CFD simulator.The best actual flow we achieved is roughly 1400, which is better than those obtained by manually tuning parameters and the maximal actual flow of training samples 1256.The candidates of optimal profile parameters outperform the training samples in the sense of the actual flow.Most of the actual flow of the candidates of optimal profile parameters are larger than 1340.Further more, it is interesting to analyze the candidates of optimal parameters obtained using our DiDo approach.We analyze the distance between the candidates of optimal parameters and the boundary of the feasible region by computing the probability predicted by the classifier f θ c (x).As shown in Fig. 6, each point corresponds to a candi-dates of optimal parameter and the f θ c (x) of obtained candidates of optimal parameters significantly deviate from 1, i.e., most candidates of optimal parameters with different actual flow predicted by DNN surrogate (abscissa) are close to the surrogate boundary (ordinate).Moreover, many of candidates are outside true feasible region examined by the simulator, i.e., these candidates are falsely classified as feasible ones by neural network (see yellow dots in Fig. 6).Therefore these candidates are close to the true boundary.For such a problem, obtaining an accurate surrogate classifier is key to our optimization.Therefore, our iterative training algorithm, which can adaptively improve the accuracy of the DNN classifier, is a key procedure for a good performance of our DiDo approach.

Toy Example: Harmonic function
To verify the validity of our method in solving high-dimensional data-informed optimization problem.Inspired by the practical problem of the rotor design, we construct a 100-dimensional optimization problem, whose optimal points locates on the boundary of the feasible region.

Problem description
We consider an optimization problem, where the objective function f (x) is a harmonic function The toy optimization problem is as follows, where x = (x 1 ,...,x d ) T ∈ R d .For demonstration, we take d = 100.Note that the harmonic function f (x) satisfies extremum principle, which indicates that the minimum of problem (5.1) is achieved on the boundary.As for the given case, it is clear that the minimum −1 is obtained at x=(1,0,...,0) T and x=(−1,0,...,0) T .Remark that although the objective function and the constraints are analytically known, we assume that the objective function and the constraint functions can only be measured through sampling.

Feasible region learned by a DNN classifier
Similarly to the rotor problem, with the same settings, we first train a DNN classifier to learn the feasible region.We set initial sample size n 0 = 3000,initial sample region B = [−0.173,0.173] 100 and we set n 1 =5000 samples in each iteration.By algorithm 2, we obtain a well-trained classifier f θ c (x)∈[0,1].We use the surrogate feasible region {x| f θ c (x)≥0.5} to represent the true feasible region Ω.
During the iterative training, the accuracy of the classifier with Gaussian noise perturbation efficiently improves as shown in Fig. 7(a).In addition, for this toy example, we know the real feasible region is a unit ball and it is clear to visualize the boundary along the radial direction.Thus, we calculate f θ c (rx), where x is uniformly sampled on the real boundary of the feasible region and r follows uniform distribution on the interval [0,2].As shown in Fig. 7(b), throughout the iterative training, the surrogate classifier approximates the true feasible region I(r ≤ 1) better and better.

Objective function learned by a DNN model
By algorithm 3, we use the classifier f θ c (x) obtained above to generate a training set D obj of size n t = 5,000 and train a GELU-DNN f θ o (x) of hidden layer size 2000-1000-600-400-200.The test accuracy of the DNN f θ o (x) is evaluated on a test set consisting of 2000 samples.As shown in Fig. 8, after training, the normalized RMSE training error is ∼ 0.01 whereas the normalized RMSE test error is ∼ 0.04.

Deep optimization
With DNN surrogate f θ o (x) and the well-trained DNN classifier f θ c (x), by algorithm 4, we obtain a set of candidates from different initial points.Note that we set the training

Conclusion and discussion
Data-informed optimization problems are common in science or industry.Though intuitive, our DiDo approach provides a promising way to solve high-dimensional data- For a type of high dimensional optimization problems, whose optimal points located in the interior region, e.g., maximize the Gaussian function in a unit ball, we find that it is more difficult to sample sufficient useful points to fit the objective function well.This phenomenon is due to the concentration phenomena in high dimension space [2].For example, if we uniformly sample data in an unit ball, the samples concentrate at an O(1/d) shell of the surface.In practice, this phenomenon can be alleviated by using a proper sampling distribution, say radial uniform sampling, according to prior knowledge.
In the iterative training process, the hyperparameter β in LMC is important to sample diverse points close to the surrogate boundary.If β is too large, we observe that the added points concentrate at the surrogate decision boundary and the new classifier can even become less accurate.This phenomenon is related to frequency principle, i.e., the points close to the boundary are high frequency in nature , thus may result in worse generalization performance [24].Empirically, proper β is needed for a steady improvement of accuracy of the DNN classifier.

Figure 1 :
Figure 1: The flow chart of the DiDo approach θ (t) c (x) at each iteration step t.For classification problem, generally, the points close to the decision boundary is of crucial importance to determine the classifier, e.g., support vectors for support vector machine (SVM).Therefore, at each iteration step, we add new training data sampled near the decision boundary of classifier f θ (t) c (x) by LMC method (see algorithm 1 for details) and train a new classifier f θ

Figure 2 :
Figure 2: schematic diagram.(a) For red point on the surrogate boundary, the distance to the true boundary is larger than , and the prediction accuracy is roughly 50%; (b) for red point on the surrogate boundary, the distance to the true boundary is much smaller than , and the prediction accuracy is close to 1.

Algorithm 4 :
Deep optimization Data: f θ c (x): well-trained DNN classifier; f θ o (x): DNN surrogate model for fitting objective function.Result: Candidates of optimal parameters 1 Substitute f θ o (x) and f θ c (x) into problem (3.3); 2 Solve problem (3.3) by gradient-descent-based optimization algorithms, such as gradient descent (GD); 3 Get candidates of optimal parameters of the problem (3.1).
(b), considering a fixed noise with variance 0.1, the accuracy of the classifier almost monotonically increases as the size of the training set.

Figure 3 :
Figure 3: (a) Classification accuracy of the DNN classifier on the perturbed terms during iteration.Note that, at each iteration t, we apply an extra constraint | f (t) θ c (x i )−0.5| ≤0.1} to the points sampled by LMC.In the two figures, label accuracy means classification accuracy after perturbation.As we add more data , the magnitude of the perturbed term when classifier accuracy on perturbed term achieve 100% gets smaller, which means the performance of classifier is better.(b) Classification accuracy of the DNN classifier on the fixed standard deviation of the perturbed terms, where variance σ 2 = 0.1.The classification accuracy is getting better as we update the DNN classifier.

Figure 4 :
Figure 4: Training loss and test loss during training DNN surrogate f θ o (x) of optimal rotor profile problem.

Figure 5 :
Figure 5: The distribution of the simulation actual flow value on sampled data used for training DNN surrogate and the candidates of optimal parameters obtained finally.The light green bars correspond to the training samples and the dark green bars correspond to the candidates of optimal parameters.

Figure 6 :
Figure 6: The classifier value f θ c (x) and the actual flow predicted by DNN surrogate f θ o (x) on these candidates of optimal parameters.The red solid line is corresponding to the probability 0.5.Both blue and yellow dots are feasible predicted by DNN, both above the solid red line.However the yellow points are outside the true boundary.

Figure 7 :
Figure 7: The performance of the classifier.(a) Classification accuracy of the DNN classifier on the perturbed terms during iteration.Note that, there are not all iteration results and at each iteration t, we apply an extra constraint | f (t) θ c (x i )−0.5| ≤ 0.1} to the points sampled by LMC.In the two figures, label accuracy means classification accuracy after perturbation.As we add more data, the magnitude of the perturbed term when classifier accuracy on perturbed term increase from 50% sharply gets smaller, which means the distance between the true boundary and surrogate boundary gets smaller,i.e., the performance of classifier is better; (b) The classifier values on the points uniformly distributed along the radial direction.As the iteration proceeds, the classifier is more closed to the real classification function I(r ≤ 1).

Figure 8 :
Figure 8: Training loss and test loss during training DNN surrogate f θ o (x) of toy example.

Figure 9 :
Figure 9: The distribution of the objective function values.Comparison between the objective function values on the initial points, i.e., the training samples used for learning DNN surrogate, and that on final candidates of optimal parameters.

Table 1 :
Notation indicator function of the region Ω, i.e., if x ∈ Ω,I Ω (x)=1; otherwise,I Ω 2to generate high quality training samples D obj = {(x i , f (x i )} well covering the feasible region of considered problem.By : energy function used in LMC method; σ: standard deviation of noise term; β: positive hyperparameter used in LMC method.
2Result: Good classifier: by D(t)with Adam;Use LMC method with proper initialization and E t (x) to sample n 1 data following distribution ∼ e −βE t (x) and obtain input set X 6 I ; 7 Perturbation: X I and add to the training data D