Design deep neural network architecture using a genetic algorithm for estimation of pile bearing capacity

Determination of pile bearing capacity is essential in pile foundation design. This study focused on the use of evolutionary algorithms to optimize Deep Learning Neural Network (DLNN) algorithm to predict the bearing capacity of driven pile. For this purpose, a Genetic Algorithm (GA) was developed to select the most significant features in the raw dataset. After that, a GA-DLNN hybrid model was developed to select optimal parameters for the DLNN model, including: network algorithm, activation function for hidden neurons, number of hidden layers, and the number of neurons in each hidden layer. A database containing 472 driven pile static load test reports was used. The dataset was divided into three parts, namely the training set (60%), validation (20%) and testing set (20%) for the construction, validation and testing phases of the proposed model, respectively. Various quality assessment criteria, namely the coefficient of determination (R2), Index of Agreement (IA), mean absolute error (MAE) and root mean squared error (RMSE), were used to evaluate the performance of the machine learning (ML) algorithms. The GA-DLNN hybrid model was shown to exhibit the ability to find the most optimal set of parameters for the prediction process.The results showed that the performance of the hybrid model using only the most critical features gave the highest accuracy, compared with those obtained by the hybrid model using all input variables.


Introduction
In pile foundation design, the axial pile bearing capacity (P u ) is considered one of the most critical parameters [1]. Throughout years of research and development, five main approaches to determine the pile bearing capacity have been adopted, namely the static analysis, dynamic analysis, dynamic testing, pile load testing, and in-situ testing [2]. It is needless to say each of the above methods possesses advantages and disadvantages. However, the pile load test is considered as one of the best methods to determine the pile bearing capacity in view of the fact that the testing process is close to the working mechanism of driven piles [3]. Having said that, this method remains time-consuming and unaffordable for small projects [3], the development of a more feasible approach is vital. Thus, many studies have been conducted to determine the a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 pile bearing capacity in taking advantage of the in-situ test results [4]. Meanwhile, the European standard (Euro code 7) [5] recommends using several ground field tests such as the dynamic probing test (DP), press-in and screw-on probe test (SS), standard penetration test (SPT), pressuremeter tests (PMT), plate loading test (PLT), flat dilatometer test (DMT), field vane test (FVT), cone penetration tests with the measurement of pore pressure (CPTu). Among the above approaches, the SPT is commonly used to determine the bearing capacity of piles [6].
Many contributions in the literature relying on the SPT results have been suggested to predict the bearing capacity of piles. As examples, Meyerhof [7], Bazaraa and Kurkur [8], Robert [9], Shioi and Fukui [10], Shariatmadari et al. [11] have proposed several empirical formulations for determining the bearing capacity of piles in sandy ground. Besides, Lopes and Laprovitera [12], Decort [13], the Architectural Institute of Japan (AIJ) [14] have introduced several formulations to determine the pile bearing capacity for various types of soil, including sandy and clayed ground. Overall, traditional methods have used several main parameters to estimate the mechanical properties of piles, such as pile diameter, pile length, soil type, number of SPT blow counts of each soil layer. However, the choice of appropriate parameters, along with the failure in covering other parameters, have led to the disagreement of results given by these methods [15]. Therefore, the development of an universal approach for the selection of a suitable set of parameters is imperative.
It is worth noticing that the development of the artificial neural network (ANN) algorithm has gained intense attention to treat design issues in pile foundation. For example, Goh et al. [38,39] have presented an ANN model to predict the friction capacity of driven piles in clays, in which the algorithm was trained by on-field data records. Besides, Shahin et al. [40][41][42][43] have used an ANN model to predict the driven piles loading capacity and drilled shafts using a dataset containing in-situ load tests along with CTP results. Moreover, Nawari et al. [44] have presented an ANN algorithm to predict the settlement of drilled shafts based on SPT data and shaft geometry. Momeni et al. [45] have developed an ANN model to predict the axial bearing capacity of concrete piles using Pile Driving Analyzer (PDA) from project sites. Last but not least, Pham et al. [15] have also developed an ANN algorithm and Random Forest (RF) to estimate the axial bearing capacity of driven pile. Regarding other ML models, Support Vector Machine Regression (SVR) and "nature inspired" meta-heuristic algorithm, namely Particle Swarm Optimization (PSO-SVR) [46] have bene used to predict the soil shear strength. Furthermore, Pham et al. [47] have presented a hybrid ML model combining RF and PSO (PSO-RF) to predict the undrained shear strength of soil. Also, Momeni et al. [48] have developed an ANN-based predictive model optimized with Genetic Algorithm (GA) technique to choose the best weights and biases of ANN model in predicting the bearing capacity of piles. In addition, Hossain et al. [49] used GA to optimize parameters of three hidden layers deep belief neural network (DBNN), include number of epochs, number of hidden units and learning rates in the hidden layers. It is interesting to notice that all the studies have confirmed the effectiveness when implementing the hybrid ML models as a practical and efficient tool in solving geotechnical problems, and particularly the axial bearing capacity of pile. Despite the recent successes of machine learning, this method has some limitations to keep in mind: It requires large amounts of of hand-crafted, structured training data and cannot be learned in real time. In addition, ML models still lack the ability to generalize conditions other than those encountered during the training. Therefore, the ML model only correctly predicts in a certain data range but is not generalized in all cases.
With a particular interest in a recently developed Deep Learning Neural Network (DLNN), which has gained tremendous success in many areas of application [50][51][52][53][54], the main objective of this study is dedicated to the development of a novel hybrid ML algorithm using DLNN and GA to predict the axial load capacity of driven piles. For this aim, a dataset consisting of 472 pile load test reports from the construction sites of Ha Nam-Vietnam was gathered. The database was then divided into the training, validation, and testing subsets, relating to the learning, validation and phases of the ML models. Next, a novel ML algorithm using GA-DLNN hybrid model was developed. ML model using GA is used to select the most important input variables to create a new smaller dataset due to the reason that many unimportant input variables could reduce the accuracy of output forecasting. Next, a GA-DLNN hybrid model was used to optimize the parameters of the DLNN model. The optimal architecture of DLNN is used to test with the new dataset and compare with the full-size case of input variables. Besides, DLNN model can be optimized to better estimate axial load capacity of pile, including number of hidden layers, number of neurons in each hidden layer, activation function for hidden layers and training algorithm. Various error criteria, especially, the mean absolute error (MAE), root mean squared error (RMSE), the coefficient of determination (R 2 ) and Index of Agreement (IA)-were applied to evaluate the prediction capability of the algorithms. In addition, 1000 simulations relating to the random shuffling of dataset were conducted for each model in order to evaluate the accuracy of final DLNN model precisely.

Significance of the research study
The numerical or experimental methods in the existing literature still have some limitations, such as lack of data set samples (Marto et al. [55] with 40 samples; Momeni et al. [45] with 36 samples; Momeni et al. [56] with 150 samples; Bagińska and Srokosz [57] with 50 samples; Teh et al. [58] with 37 samples), refinement of ML approaches or failure to fully consider key parameters which affects the predicting results of the model.
For this, the contribution of the present work can be marked through the following ideas: (i) large data set, including 472 experimental tests; (ii) reduce the input variables from 10 to 4 which help the model achieve more accurate results with faster training time, (iii) automatically design the optimal architecture for the DLNN model, all key parameters are considered, include: the number of hidden layers, the number of neurons in each hidden layer, the activation function and the training algorithm. In which, the number of hidden layers is not fixed but can be selected through cross-mating between the parent with different chromosome length. Besides, the randomness in the order of the training data set is also considered to assess the stability of predicting result of models with the training, validate and testing set.

Experimental measurement of bearing capacity
The experimental database used in this study was derived from pile load test results conducted on 472 reinforced concrete piles at the test site in Ha Nam province-Vietnam ( Fig 1A). In order to obtain the measurements, pre-cast square-section piles with closed tips were driven to the ground by hydraulic pile presses machine with a constant rate of penetration. The tests started at least 7 days after the piles had been driven, and the experimental layout is depicted in Fig 1B. It can be seen that the load increased gradually in each pile test. Depending on the design requirements, the load could be varied up to 200% of the pile load design. The time required to reach 100%, 150%, and 200% of the load could last for about 6 h to 12 h or 24 h, respectively. The bearing capacity of piles was determined following these two principles: (i) when the settlement of pile top at the current load level was 5 times or higher than the settlement of pile top at the previous load level, the pile bearing capacity was taken as the given failure load; (ii) when the load-settlement curve was nearly linear at the last load level, condition (i) could not be used. In this case, the pile bearing capacity was approximated as the load level when the settlement of the pile top exceeded 10% of the pile diameter.

Data preparation
The primary goal of the development of ML algorithms is to estimate the axial bearing capacity of the pile accurately. Therefore, as a first attempt, all the known factors affecting the pile bearing capacity were considered. Besides, it was found that most traditional approaches have used three groups of parameters: the pile geometry, pile constituent material properties, and soil properties [7][8][9][10][11][12][13][14]. It is worth noticing that the depth of the water table was not considered since it is shown that this effect have already been accounted in SPT blow counts [59]. The bearing capacity of piles was predicted based on the soil properties, determined through SPT blow counts (N) along the embedded length of the pile. In this study, the average number of SPT blows along the pile shaft (N sh ), and tip (N t ) was used. In addition, according to Meyerhof's recommendation (1976) [7], the average SPT (N t ) value for 8D above and 3D below the pile tip was also utilized, where D represented the pile diameter.
Consequently, the input variables in this work were: (1) pile diameter (D); (2) thickness of first soil layer that pile embedded (Z 1 ); (3) thickness of second soil layer that pile embedded (Z 2 ); (4) thickness of third soil layer that pile embedded (Z 3 ); (5) elevation of the natural ground (Z g ); (6) elevation of pile top (Z p ); (7) elevation of extra segment pile top (Z t ); (8) deepness of pile tip (Z m ); (9) the average SPT blow count along the pile shaft (N sh ) and (10) the average SPT blow count at the pile tip (N t ). The axial pile bearing capacity was considered as the single output (P u ). For illustration purposes, a diagram for soil stratigraphy and input, output parameters are depicted in Fig 2. The dataset containing 472 samples is statistically introduced and summarized in Table 1, including several pile tests, min, max, average and standard deviation of the input and output variables. As showed in Table 1, the pile diameter (D) ranged from 300 to 400 mm. The thickness of the first soil layer that pile embedded (Z 1 ) ranged from 3.4 m to 5.7 m. The thickness of the second soil layer that pile embedded (Z 2 ) varied from 1. In this study, the collected dataset was divided into the training, validation, and testing datasets. The training part (60% of the total data) was used to train the ML models. The validation part (20% of the total data) was used to give an estimate of model skill and tuning model's

PLOS ONE
Estimation of pile bearing capacity hyperparameters whereas testing data (20% of the remaining data), which was unknown during the training and validation phases, was used to validate the performance of the ML models.

Deep learning neural network (DLNN) with multi-layer perceptron
The multi-layer perceptron (MLP) is a kind of feedforward artificial neural network [60]. In general, the MLP includes at least three units, called the layers: the input layer, the hidden layer, and the output layer. When the hidden layer consists of more than two layers, the multi-  layer perceptron could be called Deep learning neural network (DLNN) [61,62]. In DLNN, each node in a layer is associated with a certain weight, denoted as w ij , with every node in the other layers creating a fully linked neural system [63]. Except for the input layer, each node is a neuron that uses a non-linear activation function [64]. Besides, MLP uses a supervised learning technique called backpropagation for the training process [64]. Thanks to its multi-layer, nonlinear activation functions, DLNN could distinguish non-linear separable data. Fig 4 shows the DLNN architecture used in this investigation consisting of 10 inputs, three hidden layer and one output variable A multi-layer perceptron having a linear activation function associated with all neurons represents a linear function network that links the weighted inputs to the output. Using linear algebra, it has been proved that such a network, with any number of layers, can be reduced to a two-layer input-output model. Therefore, the development of the DLNN network using nonlinear activation functions is crucial to enhance the accuracy of the model, and better mimic the working mechanism of biological neurons. The use of sigmoid functions is commonly adopted in DLNN network, with two conventional activation functions as below: The first one represents a hyperbolic tangent, ranges from -1 to 1, whereas the second one is a logistic function with similar shape but ranges from 0 to 1. In these functions, y(v i ) represents the output of the i th node, and v i is the total weight of the input connection. Besides, alternative activation functions, such as the rectifier, or more specialized function, namely radial basis functions, are also proposed.
In function of the errors of the output compared with the target, the connection weights and biases are adjusted, making the learning process occurs. This could be considered as an example of the supervised learning process using the least-squares average algorithm, which is generalized as a backpropagation algorithm. Precisely, an error in the output node j in the n th data point is given by: where d refers to the target value, y denotes the value generated by the perceptron system. The following expression relies on error correction to minimize errors of the predicted output to determine the node weights: Furthermore, the following expression uses the gradient descent algorithm to calculate the change, or the correction, for each weight: where y i denotes the output of the previous neuron, refers to the learning rate. These parameters are chosen to ensure that the error quickly converges without oscillation. Besides, the derivative is calculated based on the local field induced v j , which can be expressed as: where ϕ 0 is the derivative of the activation function. With the change in weight associated with a hidden node, the relevant derivative can be shown as: This function depends on the weight changes of nodes representing the k th output layer. This algorithm reflects the inverse backpropagation process, as the output weights change according to the activation function derivative, then the weights of the hidden layer change accordingly.

Genetic Algorithm (GA)
Holland was the first researcher who proposed a genetic algorithm (GA), a stochastic search algorithm, and optimization technique [65]. Later, GA has been investigated by other scientists, especially Deb et al. [66], Houck et al. [67]. Generally, GA is considered a simple solution for complex non-linear problems [68]. The basis of the method lies in the process of mating, breeding in an initial population, along with several activities such as selection, cross-exchange, and mutation, which help to create new, more optimal individuals [69]. In GA algorithm, the population size is an important factor reflecting the total number of solutions and significantly affects the results of the problem [70], whereas the so-called "generations" refers to the iterations of the optimization process. This process could be conditioned by several selected stopping criteria [71]. Practically, GA method has shown many benefits in finding an optimal resource set to optimize both cost and production [69]. In the field of construction, especially when evaluating the load capacity of piles, many studies have successfully and efficiently used GA method. As an example, Ardalan et al. [72] have used GA algorithm combined with neural network to predict driven piles unit shaft resistance from pile loading tests. In another study, 50 PDA (Pile Driving Analyzer) restriction tests were conducted on pre-cast concrete piles to predict the pile bearing capacity. The proposed hybrid method has provided excellent results with R 2 of 0.99 [71]. Moreover, other studies on the behavior of piles in soil using the GA method whose effectiveness has been clearly proven [68,70,[72][73][74].
In this work, taking advantage of the GA algorithm, such an optimization technique was used to optimize DLNN to predict the bearing capacity of driven pile. The pseudo algorithm is summarized below (

Features selection with GA
It is well-known that the training process with DLNN is a time-consuming and costly method due to the use of computer resource procession [75,76]. In addition, some features in the dataset might affect the regression results, as well as unnecessary features might generate noises and reduce prediction accuracy [77]. The selection of appropriate features requires considerable effort, for instance, sum of combinations C(10,i) for i from 1 to 10 could be generated with a dataset containing 10. In order to facilitate the feature selection process, the GA algorithm was used to choose the appropriate features within the dataset, expecting that fewer input variables could enhance the prediction accuracy of GA-DLNN. The detailed process of the selection mechanism is summarized in the following parts.
Firstly, genes inside the chromosome should be selected. In this study, each feature affecting the pile bearing capacity is considered as a gene. As a result, the length of the chromosome is 10, corresponding to 10 features, or 10 genes (Fig 5).
Considering the chromosome, each gene is associated with a unique value, i.e., 1 when it is selected or 0 in the other case [78]. Next, to create the population, original chromosomes are randomly selected [78]. After that, several parents were chosen for mating to create offspring chromosomes based on their fitness value associated with each solution (i.e., chromosome). The fitness value is calculated using a fitness function. The support vector regression (SVR) is chosen as the fitness function for this investigation. In the next step, the regression model is trained with the training dataset, and evaluated on the validation (or testing) dataset. In this study, the mean absolute error (MAE) cost function was used to evaluate the accuracy of the fitness function. The lower the fitness value shows a better solution. Based on the fitness function, the "parents" are filtered from the current population. The nature of GA lies in the hypothesis that mating two good solutions could produce the best solution [79]. Children born to parents can randomly choose their parents' genes. Mutations are then applied to make new genes in the next generation.

Evolution of DLNN parameters using GA and parameters tuning process
It is universally challenging to find out an optimal neural network architecture. A broad and continuous discussion of this problematic work has been the subject of intense researches. To date, no universal rules are given to define the proper number of hidden layers, neurons in each hidden layer, or functions that connecting the neurons. Considering that in the DLNN algorithm, various possibilities could be assembled to build the final network structure, the selection process becomes unachievable. To overcome this problem, the GA could be used to find the best DLNN architecture in an automatic manner. The mechanism of GA could be summarized as the following.
Firstly, the genes inside the chromosome are determined. Four parameters to be investigated are selected, including (i) the network optimizer algorithm, (ii) the activation function of the hidden layers, (iii) the number of hidden layers, and (iv) the number neurons in each hidden layer. As the number of neurons in each hidden layer is different, more genes are required. Each gene contains data representing the number of neurons in each hidden layer. Considering the maximum number of hidden layers is P 2 , then the maximum length of the chromosome is L = (3 + P 2 ). In particular, the first three genes refer to the first three parameters of the model, previously presented. It is worth noticing that in this case, each chromosome has a different length, depending on the corresponding number of hidden layers. Hence, the parameters used for the DLNN architecture could be depicted in Fig 6, such as network optimizer algorithm (P 0 ), the activation function of hidden layers (P 1 ), the number of hidden layers (P 2 ), and the number neurons in each hidden layer (P 3 . . .P L ).
The considered fitness function is DLNN model, along with four cost functions to evaluate the performance, namely R 2 , IA, MAE, and RMSE. Detailed descriptions of these criteria are given in the next section. Given that the length of the chromosome might be different, the mating progress occurs under the following principles: (i). If the length of the parents' chromosomes is similar, the child will randomly select the number of hidden layers and the number of neurons from father or mother. (ii). If the length of the parents' chromosomes is different, two cases could be considered in this case. In the first case, supposing the child chooses the number of hidden layers from a person with fewer genes, the selection will be random from the parents. In the second case, a child chooses to take the number of hidden layers from a parent that has more genes. The only option is to select the missing gene from a person with a higher chromosome length, and other genes are taken randomly from their parents. The mating process is highlighted in Fig 7. During the mutation process, few children are selected. Besides, a random gene is selected and replaced with another random value within a given range. Particularly, since the DLNN model has many parameters, the mutation rate is set at 50% of the number of children born in order to maximize the chance to find the best genes. Finally, the parameters of DLNN were finely tuned by GA through population generations to find out the best prediction performance. Table 3 summarizes the tuned parameters and their tuning ranges and options.

Performance evaluation
In order to verify the effectiveness and performance of the ML algorithms, four different criteria were selected in this study, namely, root mean square error (RMSE), mean absolute error (MAE), the coefficient of determination (R 2 ), and Willmott's index of agreement (IA). The criterion RMSE is the mean squared difference between the predicted outputs and targets, whereas MAE is the mean magnitude of the errors. The similarity between the two error criteria RMSE and MAE is that the closer these errors' criterion values to 0, the better performance of the model. The criterion R 2 is the correlation between targets and outputs [80]. The accuracy of the model is superior in the cases of small values of RMSE and MAE. The values of R 2 are in the range of [−1�1], where higher accuracy is obtained when the values are close to 1. The Index of Agreement (IA) was presented by Willmott [81,82]. The IA points out the ratio of the mean square error and the potential error. Similar to R 2 , the values of IA vary between −1 and 1, in which 1 indicates a perfect correlation, and negative value indicates no agreement. These coefficients can be calculated using the following formulas [83,84]: RMSE ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi 1 k Where k inferred the number of the samples, v i and � v i were the actual and predicted outputs, respectively, and � v was the average value of the v i .

Feature selection
The results of the feature selection process using the GA model is presented in this section. The initialization parameters of GA used in this study are given in Table 4.   compact dataset included 4 variables. As a result, the input space was reduced by 6 variables compared to the original dataset.

Optimization of DLNN architecture
The evolutionary results in predicting the pile bearing capacity of GA-DLNN model are evaluated in this section. The initialization parameters of GA-DLNN used in this study are given in Table 5. Fig 9 illustrates the evolution of the GA-DLNN model through 200 generations with 4 and 10 input variables. A summary of the best predictability of the models is presented in Table 6. For the sake of conparison and highlight the performance of the reduced input space, three different scenarios were performed. The first one used the 4-input space and simulated with GA-DLNN, denoted as 4-input GA-DLNN model. The second one contained the initial input space and performed with GA-DLNN, denoted as the 10-input GA-DLNN model. The last scenario referred to the case using 4 input variables but using DLNN as a predictor, denoted as 4-input DLNN model.  The results also show that the 4-input GA-DLNN model gives slightly better performance than the 10-input GA-DLNN model. The GA-DLNN model with 10 variables predicts   Table 7. It shows that all three models choose the same network optimization algorithm (Quasi-Newton), the number of hidden layers range from 2 to 4 and the number of neurons in each hidden layer is relatively complex, ranging from 9 to 80. However, each model chooses a different type of activation function. Fig 10 shows a visual comparison of test results and predictions based on Pu from a representative ML model. The performance of ML models has been tested on all three datasets: training, validation and testing. In this case, two representative DLNN models were selected based on the best performance through the model evolution (Fig 9), corresponding to input variables 4 and 10. One 4-input DLNN model which has the best fitness value in the first generation, was chosen to compare with the two optimal models to prove the effectiveness of model evolution. The predictive capability of the models is also summarized in Table 8.

Predictive capability of the models
From a statistical standpoint, the performance of ML algorithms should be fully evaluated. As mentioned during the simulation, 60% of the test data was randomly selected to train ML models. The performance of such a model can be affected by the selection order of the training data set. Therefore, a total of 1000 simulations were performed next, taking into account the random splitting effect in the dataset. The result is shown in Fig 11 and Tables 9-12. It can be seen that the performance of the 4-input GA-DLNN model was improved after tuning the parameters of the DLNN model and outperformed the best model in the first generation (4-input DLNN). On training set, R2 value has increased from 0.919 to 0.932. The result can be also observed on the validation set, in which the R2 value is increased (from 0.884 to 0.898). The most difference can be seen in the testing set in which R2 increased from 0.777 to 0.882. Compared to the 10-input GA-DLNN model, the R2 value is similar in training and validation, the big difference only appears in the test data set, whereas R2 value of the 4-input GA-DLNN model gives better results (R2 = 0.882) compared to 10-input GA-DLNN models (R2 = 0.8). On testing set, SD value of 4-input GA-DLNN model is smallest (SD = 0.008) compare to 10-input GA-DLNN and 4-input DLNN model (SD = 0.0351, 0.0718, respectively), indicating more stable 4-input GA-DLNN modelling.  Table 13 presents some research results on ML applications in foundation engineering. The results of this study as well as previous studies show that the expected foundation effectiveness of ML technique in foundation engineering with prediction results of foundation load is mostly reaching R2 from 0.8 to 0.9 on test data set. However, due to the use of different data sets, a comparison between these results is unwarranted. A project that uses different data sets is needed to give a generalized model to foundation engineering.

PLOS ONE
Estimation of pile bearing capacity

PLOS ONE
Estimation of pile bearing capacity

Conclusions
The main achievement of this study is to provide an efficient GA-DLNN hybrid model in predicting pile load capacity. The model has the ability to self-evolve to find the optimal model structure, where the optimal number of hidden layers can be treated as a variable and discovered during the model's evolution, besides to the other important parameters. In addition, an evolutionary model was developed to mitigate the number of input variables of the model, while ensuring the accuracy of the regression results.
The results showed that, on the training data set, all three models: 4 -input GA-DLNN, 10-input GA-DLNN and 4-input DLNN have good predict results, in which, the leading is the  DLNN). Meanwhile, the time cost for the 4-input GA-DLNN model is much lower than the 10-input GA-DLNN hybrid model (the normalize time is respectively 0.7 and 1.0). On testing data, the predictability of the 4-input GA-DLNN model proved to be superior to the other two models. The forecast result of 1000 simulations shows that the average value of R2 is 0.882, 0.8, 0.777 respectively for 4-input GA-DLNN models, 10-input GA-DLNN and 4-input DLNN. In addition, the oscillation range (minimum, maximum) of R2 value of input model GA-DLNN 4 is smaller than the other 2 models, indicating the model's stability.
As research shows that the best results are obtained by GA-DLNN with the number of hidden layers from 2 to 4. The number of neurons in each hidden layer is completely different and is distributed complexly in the hidden layers. It suggests that a DLNN model with 2, 3, 4 hidden layers might be optimal for the problem related to predicting the bearing capacity of driven piles. However, it is recommended to select the number of neurons in each hidden layer by evolutionary methods to bring out high performance for the DLNN model. The results obtained from the evolution of the DLNN model by GA show that the activation function of hidden layers mainly choose one of two categories: relu or logistic and the Quasi-Newton optimal algorithm is most suitable for predicting bearing capacity of pile.
Supporting information S1 File.