Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Optimization of non-smooth functions via differentiable surrogates

Abstract

Mathematical optimization is fundamental across many scientific and engineering applications. While data-driven models like gradient boosting and random forests excel at prediction tasks, they often lack mathematical regularity, being non-differentiable or even discontinuous. These models are commonly used to predict outputs based on a combination of fixed parameters and adjustable variables. A key transition in optimization involves moving beyond simple prediction to determine optimal variable values. Specifically, the challenge lies in identifying values of adjustable variables that maximize the output quality according to the model’s predictions, given a set of fixed parameters. To address this challenge, we propose a method that combines XGBoost’s superior prediction accuracy with neural networks’ differentiability as optimization surrogates. The approach leverages gradient information from neural networks to guide SLSQP optimization while maintaining XGBoost’s prediction precision. Through extensive testing on classical optimization benchmarks including Rosenbrock, Levy, and Rastrigin functions with varying dimensions and constraint conditions, we demonstrate that our method achieves solutions up to 40% better than traditional methods while reducing computation time by orders of magnitude. The framework consistently maintains near-zero constraint violations across all test cases, even as problem complexity increases. This approach bridges the gap between model accuracy and optimization efficiency, offering a practical solution for optimizing non-differentiable machine learning models that can be extended to other tree-based ensemble algorithms. The method has been successfully applied to real-world steel alloy optimization, where it achieved superior performance while maintaining all metallurgical composition constraints.

Introduction

Data-driven methods have established themselves as powerful tools for modeling complex systems across various scientific domains [13]. These methods, such as but not limited to gradient boosting, random forests, and neural networks, effectively capture complex relationships between input variables and output responses without requiring explicit mathematical formulations. The effectiveness of these approaches derives from their ability to learn patterns directly from data, making them particularly valuable when analytical solutions are intractable or when the underlying physical mechanisms remain poorly understood.

In optimization applications, these data-driven models present significant challenges [4]. Most modern machine learning algorithms prioritize prediction accuracy over mathematical regularity, leading to models that are often non-differentiable or even discontinuous. This characteristic makes traditional gradient-based optimization methods inapplicable, necessitating the use of derivative-free optimization (DFO) techniques or heuristic algorithms. Although these approaches can eventually find satisfactory solutions, they generally require numerous function evaluations and provide limited convergence guarantees.

Surrogate models have long been used in optimization to address computational challenges [58]. These models approximate the original objective function while maintaining properties that enable efficient optimization. Traditional surrogate approaches, such as polynomial regression or radial basis functions, focus on smoothness and differentiability but often fail to capture the complexity of modern machine learning predictions [911]. Recent research has explored various surrogate modeling strategies, from Gaussian processes [12] to simplified neural networks [13], yet the fundamental tension between model expressiveness and optimization efficiency remains unresolved. The current state-of-the-art presents a clear trade-off: researchers must either use accurate but non-differentiable models with computationally intensive optimization procedures, or reduce model accuracy for optimization efficiency [14,15]. This dilemma becomes particularly acute in high-dimensional spaces with multiple local minima, where derivative-free methods and heuristic algorithms often struggle to find optimal solutions within reasonable computational budgets. Despite extensive research in surrogate modeling, derivative-free optimization, and heuristic algorithms [16], a gap remains in balancing model accuracy with optimization efficiency.

To address these challenges and, moreover, to offer a solution, we propose the use of independently trained differentiable machine learning models as optimization surrogates. Our approach consists of two major phases: model training and optimization. In the training phase, we construct two machine learning models – an XGBoost model for accurate prediction and a neural network as a differentiable surrogate. During optimization, we leverage the neural network’s differentiability by extracting gradient information through backpropagation, which guides the SLSQP optimizer to find optimal solutions. The final prediction is then obtained using XGBoost’s superior prediction accuracy. This framework combines the differentiability advantage of neural networks for optimization with XGBoost’s prediction capabilities, rather than attempting to make existing models differentiable or simplifying them for optimization purposes. Our key insight is that while these surrogate models may not perfectly match the predictions of the original model, they can capture enough of the underlying structure to guide the optimization process effectively while enabling the use of efficient gradient-based methods.

Our primary objective is to understand the fundamental trade-off between surrogate model accuracy and optimization efficiency. Additionally, we analyze the performance differences between our approach and traditional optimization methods, including derivative-free methods and heuristic algorithms like genetic algorithms (GA), particle swarm optimization (PSO), and simulated annealing (SA). This comparison focuses on challenging scenarios involving high dimensionality and multiple local minima. The proposed methodology and analysis framework are illustrated in Fig 1. Through comprehensive empirical evaluation on classical optimization benchmarks, we establish a framework for optimizing non-differentiable machine learning models. Our experimental results demonstrate the effectiveness of differentiable surrogates and provide practical insights for implementing these methods in complex optimization problems.

Background

Related work

Surrogate modeling is a widely used approach in optimization, especially for computationally expensive simulations or when the relationship between input and output variables is not well understood [1719]. These models function as simplified approximations of more complex systems, balancing computational efficiency with prediction accuracy. Several established methods exist for surrogate modeling. Common approaches include Kriging models [20,21], response surface models [22,23], and radial basis functions [24,25]. These methods have proven effective across various engineering applications. In one study, Koziel et al. [26] demonstrated that surrogate-based optimization in transonic airfoil design achieved optimal results with lower computational costs while maintaining robustness and scalability. Owoyele et al. [27] developed a machine learning ensemble within a surrogate-based framework that outperformed traditional genetic algorithms. Their method reduced design time by 80% in test cases involving both mathematical functions and engine fuel consumption optimization. In the engine optimization case, their approach achieved 1.9% energy savings while satisfying operability constraints and emission standards. Additionally, Thakur et al. [28] introduced a deep learning surrogate model for stochastic simulations. Their approach uses a generative network combined with a conditional maximum mean discrepancy loss function to model stochastic outputs without assumptions about probability distributions.

Surrogate models are particularly useful where detailed simulation models require excessive computational resources [29]. For instance, Nyshadham et al. [30] successfully applied surrogate machine learning models to interpolate material energies, showing their versatility in different domains. The selection of appropriate surrogate modeling techniques has traditionally depended on domain expertise [31]. While many comparative studies exist, most evaluate only specific models on limited applications [32]. In optimization applications, surrogate models offer a statistical approach by training on a limited number of simulations around the operating point [33]. This method has proven effective for problems with high-dimensional variable spaces [34,35]. Besides, when analytical relationships between variables are unavailable or unsuitable for conventional gradient-based optimization, these models provide an effective alternative for optimization tasks.

Surrogate models have yielded exceptional outcomes across diverse applications in recent years. Ghafariasl et al. [6] developed a neural network ensemble model for a multi-generation system with values of 0.983–0.999, which enabled optimization with six conflicting objectives. Kim et al. [36] then applied an NN surrogate model to optimize jet impingement cooling systems, leading to a 140% enhancement in heat transfer coefficient at consistent computational costs. For building energy analysis, Ferreira et al. [37] created an inverse-based NN model from heating and cooling load signatures to predict building characteristics, which served as an efficient tool for energy retrofit screening. Baisthakur et al. [38] introduced a Physics-Informed NN surrogate model for wind turbine aerodynamics that delivered a forty-fold speedup versus traditional methods at equal accuracy. Li et al. [39] extended these advances through a deep neural network surrogate model for protonic ceramic electrolysis cells, which achieved a 12.8% performance gain via multi-objective optimization.

Numerical optimization

Let us consider a general constrained optimization problem that aims to minimize an objective function f(x) subject to inequality constraints, equality constraints, and bound constraints. The problem can be formulated mathematically as:

(1a)(1b)(1c)(1d)

where represents the objective function, and are disjoint finite index sets, denotes the constraint functions, and , specify the lower and upper bounds on . For concise notation, we define the feasible set as all points x satisfying these constraints:

(2)

The problem can then be expressed simply as finding

(3)

Optimization methods can be broadly categorized into derivative-based and derivative-free approaches. This distinction is significant due to their different performance characteristics. Derivative-based methods generally show faster convergence and higher accuracy, especially for high-dimensional problems [40]. However, these methods face two main limitations: they can get trapped in local minima [41], and they require the objective function to be sufficiently smooth (i.e., continuously differentiable). In comparison, derivative-free methods can handle non-differentiable functions but require more computation dimensionality increases.

Derivative-based optimization.

Derivative-based methods iteratively update model parameters by computing gradients and taking small steps in the negative gradient direction. These methods can be categorized into two main approaches: first-order and second-order methods. First-order gradient descent methods use the gradient vector to determine the descent direction at each iteration. Notable examples include the steepest descent algorithm [42] and conjugate gradient methods [43]. Meanwhile, Newton-type methods incorporate second-order derivative information through the Hessian matrix to compute the descent direction. Quasi-Newton methods [44] and their variants, such as the Davidon-Fletcher-Powell (DFP)[45] and Broyden-Fletcher-Goldfarb-Shanno (BFGS)[46] algorithms, approximate the inverse Hessian matrix using rank-1 or rank-2 update schemes. For solving the optimization problem presented in Equation (1), we employ the Sequential Least Squares Programming (SLSQP) method [47]. This approach combines the Han-Powell Quasi-Newton method [48,49] with a BFGS update of the matrix and uses an -test function in the step length algorithm. SLSQP optimizes successive second-order approximations of the objective function while maintaining first-order approximations of the constraints. The Sequential Quadratic Programming (SQP) method provides an effictive framework for solving problems (cf. Equation 1) with available derivatives of f and ci, where . The method utilizes the Lagrangian function, defined as:

(4)

where represents the dual variable. At each iteration k, the algorithm approximates the Hessian matrix and generates a step by solving the following subproblem:

(5a)(5b)(5c)

The SLSQP algorithm solves these subproblems sequentially until reaching convergence, typically determined by the relative change in objective function value or gradient norm [47]. This approach effectively balances computational efficiency with optimization accuracy.

Derivative-free optimization.

Derivative-free optimization (DFO) methods address optimization problems where gradient information is unavailable or unreliable. These methods use direct objective function evaluations to guide the search process. Common DFO approaches include evolutionary algorithms, pattern search methods, and trust-region methods, each offering different strategies for exploring the solution space without derivative information. In our implementation, we employ the NGOpt and NGOptRW methods [50] from the Nevergrad library [51], a state-of-the-art derivative-free optimization framework. NGOpt acts as a meta-algorithm that dynamically selects and combines various optimization strategies based on problem characteristics. The method operates through a systematic two-phase process. First, it selects appropriate algorithms by analyzing problem features and comparing them with benchmark performance data. Second, it adapts the chosen algorithms by adjusting their parameters for the specific optimization task. NGOptRW is a specialized variant of NGOpt, specifically designed for real-world applications. While NGOpt excels in theoretical benchmarks, NGOptRW incorporates modifications that enhance its effectiveness in practical scenarios [52]. This adaptation makes NGOptRW particularly suitable for optimization problems where theoretical assumptions may not fully align with real-world conditions.

Heuristic algorithms.

Heuristic algorithms provide efficient solutions to complex optimization problems through nature-inspired search strategies. These methods balance exploration of the solution space with exploitation of promising regions, making them particularly effective for problems with multiple local optima [53].

Genetic algorithms (GAs) draw inspiration from biological evolution, using principles of natural selection and genetics. The algorithm maintains a population of potential solutions, each encoded as a chromosome. Through iterative cycles of selection, crossover, and mutation operations, the population evolves toward better solutions. The selection process favors solutions with higher fitness values, while crossover combines characteristics from parent solutions to create offspring. Mutation introduces random variations to maintain population diversity and prevent premature convergence [54].

Particle swarm optimization (PSO) mimics the collective behavior of bird flocks or fish schools. Each particle in the swarm represents a candidate solution, moving through the solution space with an adjustable velocity. Particles update their positions based on both their personal best experience and the global best solution found by the swarm. The social intelligence mechanism enables particles to explore promising regions while maintaining diversity in their search patterns [55,56].

Simulated annealing (SA) derives from the physical process of metal annealing. The algorithm starts with a high "temperature" parameter that enables extensive exploration of the solution space. When the temperature gradually decreases according to a cooling schedule, the algorithm becomes more selective in accepting new solutions. This controlled transition from exploration to exploitation helps avoid local optima while eventually converging to high-quality solutions [57].

Machine learning

Machine learning models have demonstrated remarkable capabilities in capturing complex relationships between input and output variables. We focus on two widely used approaches: XGBoost and Neural Networks (NNs), which represent distinct paradigms in machine learning with different mathematical properties. XGBoost (eXtreme Gradient Boosting) [58] implements gradient boosting decision trees [59] through an ensemble learning approach. The algorithm constructs a sequence of decision trees iteratively, where each new tree focuses on the residual errors of previous predictions. At each iteration, XGBoost builds a new tree by optimizing a second-order Taylor expansion of the loss function, incorporating both gradient and Hessian information. The final prediction function for an input x takes the form:

(6)

where K is the total number of trees, and fk represents the prediction of the k-th tree. Each tree function fk partitions the input space through a series of binary splits, creating a piecewise constant function. This structure, while effective for prediction, results in a non-differentiable function due to the discrete nature of decision boundaries and the step-wise predictions at leaf nodes.

On the other hand, NNs constructs differentiable mapping functions through multiple layers of interconnected nodes. In a typical feedforward architecture, each layer transforms its inputs through a composition of linear operations and non-linear activation functions. For a layer l, the output hl is computed as:

(7)

where Wl represents the weight matrix, bl is the bias vector, and is a differentiable activation function such as ReLU [60] or sigmoid. The complete network function is a composition of these layer transformations:

(8)

This architecture ensures end-to-end differentiability, enabling efficient training through backpropagation and making NNs particularly suitable for gradient-based optimization tasks.

Note, however, that we use XGBoost and NNs to validate our methods, these principles extend to other machine learning models. Algorithms such as LightGBM [61] and random forests [59] share XGBoost’s non-differentiable characteristics and thus could benefit from independently trained surrogate models.

Differentiable surrogate model

Let us consider three functions defined on a domain . These functions satisfy the following fundamental properties:

  1. The approximation error of is strictly less than that of in the norm:(9)
  2. For each fixed parameter , the function belongs to , the space of continuously differentiable functions on .

In this framework, f represents the true underlying function that governs the process of interest. While this function is typically unknown in practical applications, it serves as the theoretical benchmark for our analysis. The function provides a high-fidelity approximation of f, minimizing the mean squared error (MSE) as measured by the norm over the domain D. This approximation is typically realized through data-driven modeling approaches.

The function serves as a differentiable surrogate model. Although exhibits inferior approximation quality compared to (as quantified by property (i)), it possesses the crucial property of continuous differentiability. This mathematical regularity enables the application of gradient-based optimization techniques.

For any given parameter vector , our primary objective is to solve the optimization problem:

(10)

Given that f is generally unknown, we instead consider the approximate optimization problem:

(11)

When lacks sufficient regularity, this optimization problem necessitates derivative-free methods to compute an approximation . Such methods typically demand substantial computational resources, potentially rendering them impractical for real-time applications.

Alternatively, we can consider the surrogate optimization problem:

(12)

The continuous differentiability of (property (ii)) enables the use of efficient gradient-based optimization methods such as SLSQP. The gradient information can be computed analytically through:

(13)

where hl represents the output of the l-th layer in the NN. However, property (i) implies that the solution may exhibit greater deviation from the true optimum , as measured by , where .

The optimization process explicitly leverages the differentiability of neural networks while maintaining XGBoost’s superior prediction accuracy. Specifically, when , we extract gradient information from the neural network’s prediction function through backpropagation:

(14)

This gradient information is then fed into the SLSQP optimizer to minimize . Once the optimizer finds the optimal solution , we obtain the final prediction using XGBoost: . This approach combines the differentiability advantage of neural networks for optimization with XGBoost’s superior prediction capabilities.

Our approach consists of two phases: model training and optimization, as illustrated in Fig 1. The framework systematically handles both differentiable and non-differentiable models to achieve optimal results.

In the training phase, we construct two machine learning models: an XGBoost model () and a NN model (). These models are trained on the same dataset to minimize their respective loss functions:

(15)(16)

The models’ performance is evaluated using the MSE:

(17)

The optimization strategy bifurcates based on the comparative MSE performance:

  • Case 1: If , we directly optimize using the NN model:(18)
    The final prediction is then computed as .
  • Case 2: If , we employ the NN as a differentiable surrogate model while leveraging XGBoost’s superior prediction accuracy. The optimization problem becomes:(19)
  • with the final prediction calculated as .

The pseudocode for our framework is presented in Algorithm 1. The computational complexity in each SLSQP iteration consists of operations for XGBoost prediction, for NN gradient computation, and for SLSQP updates, where d denotes input dimension, m represents the number of constraints, indicates the number of trees, and L and h represent the NN depth and maximum hidden width, respectively. With i iterations, the total optimization complexity amounts to , while maintaining a space complexity of for SLSQP working memory. The quadratic term from SLSQP’s quadratic programming dominates the overall computational cost.

For non-smooth optimization problems, we establish the theoretical convergence properties of our differentiable surrogate approach. Let f be a potentially non-smooth objective function and its differentiable surrogate. The approximation error can be bounded by:

(20)

where depends on the neural network architecture and training process. For high-dimensional problems , the approximation error scales with dimension as:

(21)

where is determined by the smoothness properties of f.

Algorithm 1: Differentiable surrogate model optimization framework.

Require:

1: Training data

2: Optimization domain

3: Constraint functions

Ensure:

4: Optimal solution and predicted value

5: Phase 1: Model Training

6: Train XGBoost model on

7: Train Neural Network model on

8: Compute and on validation set

9: Phase 2: Optimization

10: if then

11:  

12:  

13: else

14:  

15:  

16: end if

17: function OptimizeSLSQP

18:   Initialize

19:   while not converged do

20:    Compute gradient:

21:    Update X using SLSQP with computed gradient

22:   end while

23:   return X

24: end function

25: return ,

Under these conditions, if is a local minimum of f and is the corresponding local minimum of found by our algorithm, then:

(22)

where C depends on the Lipschitz constants of f and . This bound ensures that solutions found using our differentiable surrogate converge to true optima as the approximation quality improves.

For constrained problems with m constraints, the feasibility gap satisfies:

(23)

These theoretical guarantees support the empirical effectiveness of our approach in both non-smooth and high-dimensional optimization scenarios.

Numerical experiments

We have applied our method to three widely used benchmark datasets, namely, the Rosenbrock function, Levy function, and Rastrigin function. In particular, we tested these functions with different dataset sizes. Note that for every test function, we set up two types of variables: n independent variables and m variables that were randomly designated as constant. With respect to our comparison approach, we employed the following optimization strategies:

  • the XGBoost model is optimized directly with the gradient-free methods NGOpt and NGOptRW;
  • the XGBoost model is optimized directly with the heuristic algorithms – GA, PSO, and SA;
  • the XGBoost model is optimized with SLSQP using the gradients of the NN model function;
  • the NN model function is optimized with SLSQP.

All computations were performed using an Intel Core Ultra 7 155H Processor with 16 GB memory. For a fair comparison, we set the iteration count, which is a key hyperparameter for optimization algorithms, to 100. Further, we performed 50 independent simulations for each configuration.

Rosenbrock function

The Rosenbrock function [62], commonly known as the Valley or Banana function, is a particularly important benchmark for gradient-based optimization algorithms. Fig 2 illustrates this function in its two-dimensional form.

Rather, we will be using a multidimensional extension of this problem [63], which is given by

(24)

We select m values of either 5 or 10, and n values of either 10 or 15. For the equality constraints, we have:

(25)

We follow the procedure described in Sect 3. First consider our data generation process: we created a synthetic dataset of 50,000 data points using a truncated normal distribution within the range [0,10]. Each dataset corresponds to a specific dimensionality n. With respect to the Rosenbrock function, we computed its values for each data point. Indeed, the datasets were subsequently partitioned into training and testing subsets using a random split, with 90% of the data allocated for training and the remaining 10% reserved for testing. The two models trained on the limited dataset show notable differences in accuracy, as shown in Table 1. Figs 35 show our analysis using line plots and box plots. Furthermore, Figs 68 display the processing time for each optimization method and their constraint.

thumbnail
Fig 6. Rosenbrock function processing time and deviations from constraint for .

https://doi.org/10.1371/journal.pone.0321862.g006

thumbnail
Fig 7. Rosenbrock function processing time and deviations from constraint for .

https://doi.org/10.1371/journal.pone.0321862.g007

thumbnail
Fig 8. Rosenbrock function processing time and deviations from constraint for .

https://doi.org/10.1371/journal.pone.0321862.g008

thumbnail
Table 1. Optimization outcome for the Rosenbrock function.

https://doi.org/10.1371/journal.pone.0321862.t001

First consider the machine learning performance: the XGBoost model achieves better prediction accuracy than the NN model across all test cases. For n = 10 and m = 5, XGBoost’s MSE is 24,806, while the NN’s MSE is 43,173. This pattern continues in larger problems. For both n = 15 cases, XGBoost maintains an MSE of 38,764, while the NN’s MSE increases to 52,248.

The results demonstrate clear performance differences among various optimization approaches for the Rosenbrock function. For the first test case , we see that traditional derivative-free methods using XGBoost face significant challenges. The NGOpt and NGOptRW methods produce high Rosenbrock values of 981,443 and 948,795, requiring over 3 seconds to compute. The GA achieves better results with a value of 389,037, but requires much more time at 100.46 seconds. The heuristic methods like PSO and SA show promising results in terms of optimization values, reaching 289,427 and 291,806 respectively. SA is particularly efficient, completing in just 2.45 seconds. However, these methods struggle significantly with constraint satisfaction. Their constraint violations are very high—PSO reaches 59.451% and SA reaches 62.559%.

For the middle test case , similar patterns emerge but with some notable differences. The derivative-free methods using XGBoost continue to struggle with optimization quality. NGOpt and NGOptRW produce high Rosenbrock values of 1,301,320 and 1,239,373, requiring about 3.1 seconds each. The GA method performs better than in the smaller problem, achieving a value of 696,357, but still requires a long computation time of 55.12 seconds. PSO and SA again show competitive optimization results with values of 584,123 and 583,506. SA maintains its computational efficiency at 2.46 seconds. However, their constraint violations become even worse in this larger problem—PSO reaches 65.6843% and SA reaches 71.2083%. The differentiable surrogate model maintains its superior performance, achieving the best Rosenbrock value of 565,953 in just 0.13 seconds, while keeping constraint violations at 0.4272%.

The differentiable surrogate model approach shows remarkable performance across all metrics. By combining XGBoost’s accuracy with NN gradients in SLSQP, it achieves the best Rosenbrock value of 260,039. This optimization completes in just 0.35 seconds and maintains excellent constraint satisfaction with only 0.0022% deviation. This performance advantage becomes even more pronounced in larger problem sizes. For n = 15 and m = 10, the traditional methods struggle more, with NGOpt and NGOptRW producing much higher values around 2 million. While SA maintains relatively good optimization with 322,232, its constraint violation increases to 73.5338%. In contrast, the differentiable surrogate model achieves the best value of 312,908 in just 0.14 seconds, while keeping constraint violations at a mere 0.017%. This performance clearly shows that the approach scales well with increased problem dimensions, maintaining both optimization quality and constraint satisfaction.

Levy function

The Levy function is a challenging benchmark problem in global optimization, known for its highly nonlinear and multimodal characteristics [64]. Fig 9 shows this function in its two-dimensional form. We use a multidimensional version of the Levy function defined as:

(26)

where

(27)

Similar to our Rosenbrock function experiment setup, we consider optimization performance across different dimensions and constraint configurations. We test cases with n values of 15 or 20 variables, and m values of 5 or 10 constant variables. Note that the dataset size for this function is 100,000. The complete results of our optimization experiments are presented in Table 2. We show detailed visualizations of these results through line plots and box plots in Figs 1013. Additionally, Figs 1417 illustrate the processing time and constraint satisfaction of each method.

thumbnail
Fig 14. Levy function processing time and deviations from constraint for .

https://doi.org/10.1371/journal.pone.0321862.g014

thumbnail
Fig 15. Levy function processing time and deviations from constraint for .

https://doi.org/10.1371/journal.pone.0321862.g015

thumbnail
Fig 16. Levy function processing time and deviations from constraint for .

https://doi.org/10.1371/journal.pone.0321862.g016

thumbnail
Fig 17. Levy function processing time and deviations from constraint for .

https://doi.org/10.1371/journal.pone.0321862.g017

Following our Rosenbrock function optimization framework, we construct the NN as a surrogate model to provide gradient information. The XGBoost model shows better prediction accuracy with MSE of 20.89 for n = 15 problems and 37.86 for n = 20 problems, while the NN has higher MSE values of 32.63 and 123.16 respectively.

For the first case , the derivative-free methods using XGBoost show limited performance. NGOpt and NGOptRW produce high Levy values of 136.34 and 135.61, requiring about 3 seconds each. The GA achieves better results with 115.98 but requires a longer computation time (53.87 seconds). PSO and SA find competitive solutions (91.08 and 91.50), with SA being notably faster (2.41 seconds). However, these methods have severe constraint violations, i.e., PSO reaches 49.762% and SA reaches 63.719%. The differentiable surrogate model achieves the best Levy value of 81.42 in just 0.36 seconds, with nearly perfect constraint satisfaction (0.0002% deviation). When increasing the constant variables to m = 10 while keeping n = 15, the performance gap widens. NGOpt and NGOptRW’s solutions deteriorate to 169.25 and 159.30. While SA finds a good solution of 44.76, its constraint violation increases to 72.2693%. The differentiable surrogate model maintains excellent performance, achieving the best value of 41.03 in 0.47 seconds with zero constraint violation.

For larger problems , the trend continues. Traditional methods struggle more, with NGOpt and NGOptRW producing values of 189.15 and 176.07. The GA maintains reasonable constraint satisfaction but requires 52.01 seconds. The differentiable surrogate model again shows superior performance, finding a solution of 127.02 in just 0.16 seconds with minimal constraint violation (0.0005%). In the most complex case , the advantages become even more pronounced. The differentiable surrogate model achieves the best value of 89.78 in 0.14 seconds with zero constraint violation, while other methods either produce much higher values (NGOpt: 221.42, NGOptRW: 203.44) or severe constraint violations (SA: 70.9141%, PSO: 54.9921%).

Rastrigin function

Finally consider the Rastrigin function. The Rastrigin function is a widely-used benchmark problem in global optimization. It presents a challenging landscape characterized by numerous local minima arranged in a regular, symmetric pattern [64]. The 2D Rastrigin function is illustrated in Fig 18. For our analysis, we use a multidimensional version of the Rastrigin function defined as:

(28)

Here, we consider the Rastrigin function with increased problem dimensions compared to our previous experiments. We set the number of variables n to either 20 or 25, representing larger scale optimization problems. For each n value, we maintain our previous constraint configurations by setting the number of constant variables m to either 5 or 10. For the Rastrigin function experiments, we generate a dataset of 200,000 samples. The complete optimization results are presented in Table 3. We illustrate the optimization performance through line plots and box plots in Figs 1922. The computational efficiency and constraint satisfaction of each method are shown in Figs 2326.

thumbnail
Fig 23. Rastrigin function processing time and deviations from constraint for .

https://doi.org/10.1371/journal.pone.0321862.g023

thumbnail
Fig 24. Rastrigin function processing time and deviations from constraint for .

https://doi.org/10.1371/journal.pone.0321862.g024

thumbnail
Fig 25. Rastrigin function processing time and deviations from constraint for .

https://doi.org/10.1371/journal.pone.0321862.g025

thumbnail
Fig 26. Rastrigin function processing time and deviations from constraint for .

https://doi.org/10.1371/journal.pone.0321862.g026

thumbnail
Table 3. Optimization outcome for the Rastrigin function.

https://doi.org/10.1371/journal.pone.0321862.t003

The optimization experiments on the Rastrigin function show clear performance patterns across different problem dimensions. These machine learning models demonstrate consistent prediction characteristics: XGBoost achieves better accuracy with MSE values of 266.40 and 439.01 , while the neural network produce higher MSE of 1021.84 and 1291.68 respectively.

The results demonstrate several key findings. In smaller problems , derivative-free methods deliver moderate performance. NGOpt and NGOptRW require about 3 seconds to reach solutions around 760, and GA produces better constraint satisfaction but needs significantly more time (54.91 seconds). PSO and SA reach improved solutions near 560, but their constraint violations exceed 79%.

The effectiveness of our differentiable surrogate model becomes clear through its consistent performance. The model achieves the lowest Rastrigin value (530.81) in just 0.10 seconds and maintaining perfect constraint satisfaction. This superior performance continues as problem complexity increases. For , the model reaches a value of 342.69 in 0.27 seconds. At , it obtains a solution of 491.33 in 0.24 seconds. In both cases, it maintains zero constraint violations. In comparison, other methods show declining performance with larger dimensions—NGOpt and NGOptRW produce values approaching 1000, and PSO and SA’s constraint violations reach up to 87%.

These optimization results confirm the advantage of combining gradient information from NN with XGBoost’s accurate prediction capabilities. The gradient guidance from the NN enables SLSQP to find better solutions consistently across all test scenarios, despite the higher MSE values. The differentiable surrogate model achieves optimal Rastrigin values with perfect constraint satisfaction, and requires only a fraction of the computational time compared to traditional methods. This advantage becomes more pronounced as the problem dimension and constraint complexity increase, which demonstrates the robustness of our approach.

Real world case

To further validate the effectiveness of our approach, we consider a real-world optimization problem in materials engineering. The dataset contains 15,000 samples with 15 input variables representing the amounts of different materials in the mixture, and the output representing the elongation measurement. The optimization problem can be formulated as:

(29a)(29b)(29c)(29d)

where D(x) represents the elongation prediction model, ui represents the upper bound for each material proportion, and the constraints ensure physical feasibility. The first constraint bounds each material’s proportion, the second constraint ensures the total mixture sums to 100%, and the third constraint ensures non-negativity of material proportions. This formulation reflects real-world manufacturing constraints while seeking to maximize the material’s elongation properties. The results are presented in Table 4 and Fig 27.

thumbnail
Fig 27. Real-world elongation optimization results. Left: Loss values for each method. Right: Processing time for each method.

https://doi.org/10.1371/journal.pone.0321862.g027

thumbnail
Table 4. Optimization outcome for the elongation problem.

https://doi.org/10.1371/journal.pone.0321862.t004

We follow the same optimization configurations as in the benchmark functions, with XGBoost providing accurate predictions and the NN serving as a surrogate model for gradient information. The XGBoost model achieves an MSE of 2.87, while the NN has a higher MSE of 11.46. NGOpt and NGOptRW produce loss values around 90, while GA achieves a better value of 71.08 but requires 39.21 seconds. PSO and SA find competitive solutions near 45, but their constraint violations exceed 45%. The differentiable surrogate model achieves the best loss value of 37.19 in just 0.36 seconds, with zero constraint violation. The NN model also performs well, achieving a value of 39.18 in 0.40 seconds with zero constraint violation. The result again demonstrates the effectiveness of our approach in real-world optimization scenarios, where the differentiable surrogate model consistently outperforms traditional methods in both optimization performance and computational efficiency.

Conclusion

We presented an approach that employs independently trained differentiable machine learning models as surrogate models during optimization, applied to three benchmark datasets and a real world application. Our methodology integrates two complementary techniques: XGBoost for its superior prediction accuracy and neural networks for providing gradient information. Through extensive testing on the Rosenbrock, Levy, and Rastrigin functions with varying dimensions and constraint conditions , as well as on a 15-dimensional real-world material design problem, we demonstrate the effectiveness of this approach. The experiments consistently show that our differentiable surrogate model achieves solutions up to 40% better than traditional methods while reducing computation time by orders of magnitude, in both theoretical benchmarks and practical applications.

A key contribution of our approach lies in the combination of XGBoost’s accurate predictions, with NN as surrogate model to guide the SLSQP optimizer, or indeed any gradient-based optimization algorithm. Notably, this paradigm can be extended to other tree-based ensemble algorithms, such as LightGBM, Catboost, and random forest, thereby offering flexibility while preserving the core advantages of our optimization strategy. This strategy proves particularly effective in handling complex optimization landscapes while maintaining strict constraint satisfaction. Across all benchmark functions, our method consistently outperforms both derivative-free approaches like NGOpt, NGOptRW, and heuristic algorithms (i.e., GA, PSO, and SA) not only in solution quality but also in computational efficiency and constraint handling. The results show near-zero constraint violations across all test cases, even as problem complexity increases.

While these results are promising, several limitations remain. The independent training of surrogate models requires extra computational cost, significantly increasing initial training time. Although offset by faster optimization runs, the process is computationally intensive for high-dimensional inputs, potentially limiting real-time or time-constrained applications. Moreover, performance may be sensitive to the neural network architecture and training parameters, which were not exhaustively explored in this study. Additionally, the current implementation requires training on the full dataset for both models, posing challenges for larger-scale problems.

Future research could address these challenges through several key directions. The development of adaptive training strategies could significantly reduce the computational burden while maintaining model accuracy, particularly for high-dimensional problems where training time becomes a critical factor. The methodology could be extended to multi-objective optimization problems, where surrogate models could provide gradient information for multiple competing objectives, opening new possibilities in complex engineering optimization scenarios. While we have demonstrated our approach’s effectiveness on a real-world steel alloy optimization problem, further applications in diverse domains like process optimization or structural design would provide additional insights into the method’s capabilities across different industrial contexts. Such expanded validation would help identify domain-specific challenges and opportunities for enhancing the method’s practical impact.

References

  1. 1. Bishnu SK, Alnouri SY, Al-Mohannadi DM. Computational applications using data driven modeling in process systems: a review. Digit Chem Eng. 2023;8:100111.
  2. 2. Yabe T, Rao PSC, Ukkusuri SV, Cutter SL. Toward data-driven, dynamical complex systems approaches to disaster resilience. Proc Natl Acad Sci U S A. 2022;119(8):e2111997119. pmid:35135891
  3. 3. Ekundayo F. Leveraging AI-driven decision intelligence for complex systems engineering. Int J Res Publ Rev. 2024;5(11):1–10.
  4. 4. Gupta R, Zhang Q. Data-driven decision-focused surrogate modeling. AIChE J. 2024;70(4).
  5. 5. Cheng M, Zhao X, Dhimish M, Qiu W, Niu S. A Review of Data-driven Surrogate Models for Design Optimization of Electric Motors. IEEE Trans Transp Electrif. 2024;PP(99):1-1. https://doi.org/10.1109/TTE.2024.3366417
  6. 6. Ghafariasl P, Mahmoudan A, Mohammadi M, Nazarparvar A, Hoseinzadeh S, Fathali M, et al. Neural network-based surrogate modeling and optimization of a multigeneration system. Appl Energy. 2024;364:123130.
  7. 7. Kudela J, Matousek R. Recent advances and applications of surrogate models for finite element method computations: a review. Soft Comput. 2022;26(24):13709–33.
  8. 8. Tong H, Huang C, Minku LL, Yao X. Surrogate models in evolutionary single-objective optimization: a new taxonomy and experimental study. Inf Sci. 2021;562:414–37.
  9. 9. Chen G, Zhang K, Xue X, Zhang L, Yao C, Wang J, et al. A radial basis function surrogate model assisted evolutionary algorithm for high-dimensional expensive optimization problems. Appl Soft Comput. 2022;116:108353.
  10. 10. Berkemeier M, Peitz S. Derivative-free multiobjective trust region descent method using radial basis function surrogate models. Math Comput Appl. 2021;26(2):31.
  11. 11. Parnianifard A, Chaudhary S, Mumtaz S, Wuttisittikulkij L, Imran MA. Expedited surrogate-based quantification of engineering tolerances using a modified polynomial regression. Struct Multidisc Optim. 2023;66(3):6.
  12. 12. Shadab S, Hozefa J, Sonam K, Wagh S, Singh NM. Gaussian process surrogate model for an effective life assessment of transformer considering model and measurement uncertainties. Int J Electr Power Energy Syst. 2022;134:107401.
  13. 13. Lim Y-F, Ng CK, Vaitesswar US, Hippalgaonkar K. Extrapolative Bayesian optimization with Gaussian process and neural network ensemble surrogate models. Adv Intell Syst. 2021;3(11):202170077.
  14. 14. Liu Y, Zhao G, Li G, He W, Zhong C. Analytical robust design optimization based on a hybrid surrogate model by combining polynomial chaos expansion and Gaussian kernel. Struct Multidisc Optim. 2022;65(11):335.
  15. 15. de Paula Garcia R, de Lima BSLP, de Castro Lemonge AC, Jacob BP. An enhanced surrogate-assisted differential evolution for constrained optimization problems. Soft Comput. 2023;27(10):6391–414.
  16. 16. Kaveh M, Mesgari MS. Application of meta-heuristic algorithms for training neural networks and deep learning architectures: a comprehensive review. Neural Process Lett. 2022:1–104. pmid:36339645
  17. 17. Al R, Behera CR, Gernaey KV, Sin G. Stochastic simulation-based superstructure optimization framework for process synthesis and design under uncertainty. Comput Chem Eng. 2020;143:107118.
  18. 18. Marvi-Mashhadi M, Lopes CS, LLorca J. High fidelity simulation of the mechanical behavior of closed-cell polyurethane foams. J Mech Phys Solids. 2020;135:103814.
  19. 19. Wang L, Chen X, Kang S, Deng X, Jin R. Meta-modeling of high-fidelity FEA simulation for efficient product and process design in additive manufacturing. Addit Manuf. 2020;35:101211.
  20. 20. Diao K, Sun X, Lei G, Guo Y, Zhu J. Multimode optimization of switched reluctance machines in hybrid electric vehicles. IEEE Trans Energy Convers. 2021;36(3):2217–26.
  21. 21. Jin Z, Sun X, Cai Y, Zhu J, Lei G, Guo Y. Comprehensive sensitivity and cross-factor variance analysis-based multi-objective design optimization of a 3-DOF hybrid magnetic bearing. IEEE Trans Magn. 2021;57(2):1–4.
  22. 22. Ma C, Qu L. Multiobjective optimization of switched reluctance motors based on design of experiments and particle swarm optimization. IEEE Trans Energy Convers. 2015;30(3):1144–53.
  23. 23. Dos Santos Neto PJ, dos Santos Barros TA, de Paula MV, de Souza RR, Ruppert Filho E. Design of computational experiment for performance optimization of a switched reluctance generator in wind systems. IEEE Trans Energy Convers. 2017;33(1):406–19.
  24. 24. Cai J, Deng ZQ, Qi RY, Liu ZY, Cai YH. A novel BVC-RBF neural network based system simulation model for switched reluctance motor. IEEE Trans Magn. 2011;47(4):830–8.
  25. 25. Sahraoui H, Zeroug H, Toliyat HA. Switched reluctance motor design using neural-network method with static finite-element simulation. IEEE Trans Magn. 2007;43(12):4089–95.
  26. 26. Koziel S, Leifsson L. Surrogate-based aerodynamic shape optimization by variable-resolution models. AIAA J. 2013;51(1):94–106.
  27. 27. Owoyele O, Pal P. A novel machine learning-based optimization algorithm (ActivO) for accelerating simulation-driven engine design. Appl Energy. 2021;285:116455.
  28. 28. Thakur A, Chakraborty S. A deep learning based surrogate model for stochastic simulators. Probabilistic Eng Mech. 2022;68:103248.
  29. 29. Alizadeh R, Allen JK, Mistree F. Managing computational complexity using surrogate models: a critical review. Res Eng Design. 2020;31(3):275–98.
  30. 30. Nyshadham C, Rupp M, Bekker B, Shapeev AV, Mueller T, Rosenbrock CW, et al. Machine-learned multi-system surrogate models for materials prediction. npj Comput Mater. 2019;5(1).
  31. 31. Williams B, Cremaschi S. Novel tool for selecting surrogate modeling techniques for surface approximation. Comput Aided Chem Eng. 2021:451–6.
  32. 32. Davis SE, Cremaschi S, Eden MR. Efficient surrogate model development: impact of sample size and underlying model dimensions. Comput Aided Chem Eng. 2018:979–84.
  33. 33. Queipo NV, Haftka RT, Shyy W, Goel T, Vaidyanathan R, Kevin Tucker P. Surrogate-based analysis and optimization. Prog Aerosp Sci. 2005;41(1):1–28.
  34. 34. Enss GC, Kohler M, Krzyzak A, Platz R. Nonparametric quantile estimation based on surrogate models. IEEE Trans Inform Theory. 2016;62(10):5727–39.
  35. 35. Bramerdorfer G, Zavoianu A-C, Silber S, Lughofer E, Amrhein W. Possibilities for speeding up the FE-based optimization of electrical machines—a case study. IEEE Trans Ind Appl. 2016;52(6):4668–77.
  36. 36. Kim S, Ki S, Bang S, Han S, Seo J, Ahn C, et al. Optimizing energy-efficient jet impingement cooling using an artificial neural network (ANN) surrogate model for high heat flux semiconductors. Appl Therm Eng. 2024;239:122101.
  37. 37. Ferreira S, Gunay B, Wills A, Rizvi F. A neural network-based surrogate model to predict building features from heating and cooling load signatures. J Build Perform Simul. 2024;17(5):631–54.
  38. 38. Baisthakur S, Fitzgerald B. Physics-informed neural network surrogate model for bypassing blade element momentum theory in wind turbine aerodynamic load estimation. Renew Energy. 2024;224:120122.
  39. 39. Li Z, Yu J, Wang C, Bello IT, Yu N, Chen X, et al. Multi-objective optimization of protonic ceramic electrolysis cells based on a deep neural network surrogate model. Appl Energy. 2024;365:123236.
  40. 40. Kolda TG, Lewis RM, Torczon V. Optimization by direct search: new perspectives on some classical and modern methods. SIAM Rev. 2003;45(3):385–482.
  41. 41. Haupt R. Comparison between genetic and gradient-based optimization algorithms for solving electromagnetics problems. IEEE Trans Magn. 1995;31(3):1932–5.
  42. 42. Meza JC. Steepest descent. WIREs Comput Stat. 2010;2(6):719–22.
  43. 43. Nocedal J, Wright SJ. Conjugate gradient methods. Num Optim. 2006:101–34.
  44. 44. , Jr. JE, Moré JJ. Quasi-Newton methods, motivation and theory. SIAM Rev. 1977;19(1):46–89.
  45. 45. Nocedal J, Wright SJ. Numerical optimization. New York, NY: Springer; 1999.
  46. 46. Fletcher R. Practical methods of optimization. Wiley; 2013.
  47. 47. Kraft D. A software package for sequential quadratic programming. Forschungsbericht Deutsche Forschungs- und Versuchsanstalt für Luft- und Raumfahrt. WorldCat; 1988.
  48. 48. Han SP. A globally convergent method for nonlinear programming. J Optim Theory Appl. 1977;22(3):297–309.
  49. 49. Powell MJD. The convergence of variable metric methods for nonlinearly constrained optimization calculations. Nonlinear Program. 1978;3:27–63.
  50. 50. Meunier L, Rakotoarison H, Wong PK, Roziere B, Rapin J, Teytaud O, et al. Black-box optimization revisited: improving algorithm selection wizards through massive benchmarking. IEEE Trans Evol Computat. 2022;26(3):490–500.
  51. 51. Rapin J, Teytaud O. Nevergrad - a gradient-free optimization platform. GitHub Repository. 2018.
  52. 52. Bennet P, Langevin D, Essoual C, Khaireh-Walieh A, Teytaud O, Wiecha P, Moreau A. An illustrated tutorial on global optimization in nanophotonics. arXiv, preprint, arXiv:2309.09760. 2023.
  53. 53. Kokash N. An introduction to heuristic algorithms. Department of Informatics and Telecommunications. 2005:1–8.
  54. 54. Holland JH. Genetic algorithms. Sci Am. 1992;267(1):66–72.
  55. 55. Kennedy J, Eberhart R. Particle swarm optimization. In: Proceedings of ICNN’95 - International Conference on Neural Networks, vol 4, 1995, pp. 1942–8.
  56. 56. Li F, Yue Q, Liu Y, Ouyang H, Gu F. A fast density peak clustering based particle swarm optimizer for dynamic optimization. Expert Syst Appl. 2024;236:121254.
  57. 57. Bertsimas D, Tsitsiklis J. Simulated annealing. Stat Sci. 1993;8(1):10–15.
  58. 58. Chen T, Guestrin C. XGBoost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 2016:785–94.
  59. 59. Friedman JH. Greedy function approximation: a gradient boosting machine. Ann Statist. 2001:1189–232.
  60. 60. Kingma DP, Ba J. Adam: a method for stochastic optimization. arXiv, preprint, arXiv:1412.6980. 2014.
  61. 61. Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, et al. LightGBM: a highly efficient gradient boosting decision tree. In: Advances in Neural Information Processing Systems 30, 2017.
  62. 62. Rosenbrock HH. An automatic method for finding the greatest or least value of a function. Comput J. 1960;3(3):175–84.
  63. 63. Goodman J, Weare J. Ensemble samplers with affine invariance. CAMCoS. 2010;5(1):65–80.
  64. 64. Laguna M, Martí R. Experimental testing of advanced scatter search designs for global optimization of multimodal functions. J Glob Optim. 2005;33(2):235–55.