An approach for predicting the compressive strength of cement-based materials exposed to sulfate attack

In this paper, a support vector machine (SVM) model which can be used to predict the compressive strength of mortars exposed to sulfate attack was established. An accelerated corrosion test was applied to collect compressive strength data. For predicting the compressive strength of mortars, a total of 638 data samples obtained from experiment was chosen as a dataset to establish a SVM model. The values of the coefficient of determination, the mean absolute error, the mean absolute percentage error and the root mean square error were used for evaluating the predictive accuracy. The main factors affecting the predicted compressive strength were obtained by sensitivity analysis. A SVM model was calibrated, validated, and finally established. Moreover, the performance of the SVM model was compared to an artificial neural network (ANN) model. Results show that the prediction values from the SVM model were close to the experimental values; the main factors sensitive to concrete compressive strength were exposure time, water-cement ratio and sulfate ions; the performance of the SVM model was better than the ANN model. The SVM model developed in this study can be potentially used for predicting the compressive strength of cement-based materials servicing in harsh environments.


Introduction
Concrete is an important structural material being used in civil engineering and industrial facilities. The strength of concrete is considered as one of the most important property for a given concrete mix design. Besides the constituent of materials, the strength is also affected by environmental exposure and extreme working conditions [1]. In harsh environments, especially the areas with abundant sulfate ions, the properties of concrete materials can easily deteriorate, which could affect the safety of engineering structures [2]. For the safety assessment of existing structures, compressive strength is often considered as the most important indicator of concrete quality [3,4]. Monitoring concrete strength during service can give an idea about the time for concrete quality control and performance maintenance [5]. In addition, predicting concrete strength can be helpful in assessing the deterioration of concrete structures and increasing their safety [6]. Thus, methods for predicting and estimating real-time concrete strength are important. Unfortunately, due to the complex degradation mechanisms and multiple influencing parameters [7], there is no effective method to predict compressive strength of cement-based materials in harsh environments. To date, there are two categories about the prediction of concrete compressive strength [8]. The first category is traditional mathematics statistical forecasting methods. It needs a huge amount of data. When the sample data tends to infinity, it tends to predict real results, but the actual number of samples is often limited. The second category, nonlinear prediction methods, lacks a unified mathematical theory. The predicted results are often a partial optimal solution, rather than a global optimal solution. For conventional concrete, the above categories can all be used to predict the values of compressive strength, but for concrete exposed to sulfate attack as the number of input factor increases, the relationship between the input factors and the compressive strength becomes highly nonlinear and complex. Hence, the regression models are not suitable for predicting the values of compressive strength of concrete in harsh environments. Therefore, more attentions have been paid to models based on artificial intelligence. Machine learning techniques, such as artificial neural network (ANN) is increasingly used to simulate the strength of concrete materials and has become an important research area [9][10][11][12].
An ANN model is usually consisted of inputs, weights, sum function, activation function and outputs. The related algorithms require setting up of different learning parameters, the optimal number of nodes in the hidden layer and the number of hidden layers. Until now, back-propagation (BP) algorithm, which adjusts connection weights and bias values during training, has been widely used for training an ANN model. However, ANN models still have disadvantages: (1) The information about the relative significance of the various parameters cannot be provided [13]; (2) A reasonable interpretation of the overall structure of the network is hard to be established [14]. In addition, ANN models have some intrinsic disadvantages such as slow convergence speed, less generalizing effectiveness, arriving at local minimum and over-fitting problems [15]. To overcome those limitations, in recent years, researchers have explored the potential of support vector machine (SVM) in performance of cement-based materials.
SVM, a nonlinear modeling approach, proposed based on the statistical theory by Vapnik [16] is being applied in the field of civil engineering. Unlike ANN models, a SVM model has the advantage of reducing training error and being a unique and globally optimum [17]. The method has excellent generalization capability when solving non-linear problems. It can also overcome the problem of small sample size. For example, Yan and Shi [18] used SVM to predict the elastic modulus of normal and high strength concrete. The analytical results showed that the SVM outperformed other models. Chou et al. [19] predicted the compressive strength of high performance concrete by using the SVM technique, and the behavior simulation capability of SVM was investigated using concrete data from several countries. Cheng et al. [20] proposed an advanced hybrid AI model that fused fuzzy logic, weight SVM and fast messy genetic algorithms to predict compressive strength of concrete. Gupta [21] investigated the potential use of SVM for predicting CCS by combining radial basis function with SVM.
There have been few studies on the prediction of the compressive strength of cement-based materials exposed to sulfate attack using SVM. Most of these studies established the prediction models of concrete compressive strength mainly based on the material factors (e.g., waterbinder ratio, water content and aggregate content) and curing age [9][10][11][12][18][19][20]. However, for concrete subjected to service in a harsh environment, environmental factors are of great importance to the compressive strength and should be considered in the compressive strength prediction models.
To accurately predict the concrete compressive strength using SVM, a data set with a large amount of experimental data is required. Compared to field testing, the indoor accelerated corrosion testing can be performed with a controlled environment and thus has been widely applied to quickly obtain the compressive strength of concrete [22,23].
In this study, support vector machine was applied to predict the compressive strength of cement-based materials exposed to sulfate attack. To establish a SVM model for predicting compressive strength, the accelerated corrosion test of mortars with different water-cement (w/c) ratios was carried out, and 638 sets of data from our experiment was collected. The SVM model was first calibrated and then validated. The values of the coefficients of determination (R 2 ), the mean absolute error (MAE), and the mean absolute percentage error (MAPE) and the root mean square error (RMSE) were used for evaluating the predictive accuracy. Furthermore, the main factors that influence the predicted compressive strength were obtained by sensitivity analysis. Finally, the performance of the SVM model was further evaluated by comparison with an ANN model.

Theory of SVM
The essence of SVM is to map data samples with highly nonlinear relationships in the lowdimensional space onto a high-dimensional space. The data samples are classified according to the principle of risk structure optimization. The regression function f (x, w) is expressed by the following equation: where g j (x) is a mapping function, ω j is the weight coefficient, b is the threshold. According to the principle of structural risk minimization, regression optimization constraints can be expressed as: subject to where c is the penalty parameter, ε i and ε Ã i are the slack variables, ε is the insensitive loss function.
Then, the optimization problem can be transformed into a dual problem and the regression function can be written as: where n sv is the number of support vectors; k(x, x i ) is the kernel function; α i and a Ã i are the Lagrange multipliers; c is the penalty parameter. a Ã i can be obtained by solving the above constrained optimization problem. Threshold value b can be calculated by a Ã i .

Calculation process
The MATLAB software was used to implement the SVM model. The calculation process of the SVM model is shown in Fig 1. 1. Determine the training dataset where, {x i , y i } i = 1 , x i 2 R n is the factor of influencing the compressive strength of group i; y i 2 R n is the expected output intensity value of group i from the training data.
2. Choose the appropriate kernel function and solve the optimization problem In this study, radial basis function (RBF) is chosen for the kernel function: According to the principle of structure optimization, the regression optimization goal is expressed as: subject to Then, the optimal solution can be calculated as: where c is the penalty parameter; -gamma (g) is the kernel function parameter.
3. Optimize the parameter The punishing of parameter c and kernel function parameter g would directly affect the prediction results. However, there is so far no a best way to determine the c and g. In this study, the parameters c and g were obtained by a K-fold crossover algorithm [24]. All the average prediction accuracy is CV. When CV achieves the best accuracy, c and g are the optimum parameters. Then the optimal solution a Ã was solved according to the type of the optimal constraint.

Calculate the threshold value
The threshold b Ã is calculated by the equation as follow:

Construct decision function
After parameters a Ã , b Ã and g having been determined, further calculation is carried out as follow: where x is the prediction data.

Performance evaluation methods
The R 2 , MAE, MAPE and RMSE were used to evaluate the prediction accuracy of the SVM model [25]. R 2 is a measure of how well the independent variables approximate the measured dependent variable, while MAE, MAPE and RMSE are used as a measure of differences between the values predicted by the model. Low values of MAE, MAPE and RMSE, and high values of R 2 are generally indicative of a good performance. They are defined as follows: where y and y i are the actual value and predicted value, respectively; n is the number of data samples.

Materials
Type II ordinary Portland (P II 52.5) cement purchased from China United Cement Corporation was used in this study. The chemical composition and mineral original composition of cement are shown in Tables 1 and 2, respectively. ISO standard sand with a density of 2.58 g/ cm 3 obtained from Xiamen standard sand Co., Ltd. was used as fine aggregate. The maximum sizes of the sands were 4.75 mm. A superplasticizer (SP) with 36.8 wt% solid content was purchased from Jiangsu Sobute New materials Co. Ltd. and its water-reducing ratio was 29.8%. Tap water was used in concrete mixtures and curing application in this study. Na 2 SO 4 , analytically pure, obtained from China Guoyao Chemical Company, is used to prepare different concentrations of sodium sulfate solutions.

Mix proportion and specimens preparation
Three types of mortar (M65, M50, M28) with respective w/c ratio of 0.65, 0.50 and 0.28 were designed and the compositions is shown in Table 3. After the fresh concrete was prepared, the mixtures were cast into 40 × 40 × 160 mm steel moulds and compacted on a vibrating table.
The samples were demoulded 24 hours after casting. After demoulding, the specimens were cured in a curing room (Temperature = 20 ± 2˚C, RH > 95%) for 90 days.

Accelerated deterioration test
After curing for 90 days, the specimens were degraded in a constant temperature and constant humidity box (Shanghai Jinghong Experimental Equipment Co., Ltd.), which temperature range is from 20˚C to 80˚C, and humidity range is from 50% to 95%. Three different

Compressive strength after deterioration
The compressive strengths of mortars exposed to sulfate attack are shown in Fig 2. The results show that the compressive strength values of all specimens gently increased in the early stages. This increase is probably because sulfate ions diffuse slowly into specimens, and the specimens have not been corroded by the sulfate ions. When sulfate ions start to react with cement hydration products to form expanded products (gypsum and ettringite) with larger volumes than Approach for predicting the compressive strength of cement-based materials the reactants, the pore structures become denser due to the increasing volume of the expanded products in the early stage of deterioration [28]. With the increasing amounts of expanded products, the compressive strength values decreased gradually. Fig 2 also shows that, the greater the w/c ratios, the larger the descent rates of compressive strength. These different decreasing rates of compressive strength can be caused by the different sulfate resistances of specimens with different micro-pores characteristics and porosity values. The sulfate resistance is higher with a lower w/c ratio. For mortar with a lower w/c ratio, the pore structure is much denser, and the ion diffusivity is lower, and thus the amounts of sulfate ions reacting with the hydration products are smaller.

Data preprocessing
To successfully develop a SVM model to predict the compressive strength, sufficient experimental data is needed. In this study, the experimental data set was collected from laboratory test and 638 sample data were prepared. A total of 550 sample data were used for model training, and 88 sample data were used for model testing.
The database examples are shown in Table 4. The training data set was used to calibrate the model with 10 input variables, and the testing data set was used to estimate the model's performance. As shown in Table 4, the following input variables were used: w/c ratio, cement content (C, %), water content (W, %), sand content (S, %), sulfate ions concentration (SO 4 2-, wt%), wetting temperature (Wet-T,˚C), wetting time (Wet-t, hours), drying temperature (Dry-T,˚C), drying time (Dry-t, hours), and exposure time (Exp-t, days). Ultimate compressive strength (Fm, MPa) was used as the output parameter.

Optimum parameters determining
As shown in Fig 3, the mean square error (MSE) was calculated by crossover operation training in the search range of c and g (search range: 2 −8~28 ). When the CVmse achieved 0.0038, the optimal penalty parameter c was 1, and the optimum parameter of the radial basis kernel g was 0.3299. According to the report of Yassi and Moattar [24], a smaller c and g will cause under-fitting, while a bigger c and g will cause over-fitting. Both of them will affect the generalization ability of the model.

Prediction performance
Based on the experimental data, a SVM model was established to learn the complicated interrelationships between compressive strength and varied input variables. For convenient comparison purposes, the scatter diagrams of the experimental and predicted results are plotted in Figs 4 and 5. It can be seen that most predicted points are close to the experimental values.   (10), (11), (12) and (13). The coefficient of determination of training data was 0.9994, while the coefficient of determination of testing data was 0.9991. The MAE were all less than 3.1 MPa, the MAPE were all less than 3.8%, and the RMSE of training data and testing data were less than 3.6 MPa. These results indicate that the SVM model has a good performance in predicting compressive strength of mortars exposed to sulfate attack.

Sensitivity analysis
To assess the changes in the output caused by the input changes of the SVM model, a sensitivity analysis was performed. According to the calculation method developed by Liong et al., the sensitivity of an input parameter can be calculated by the following formula [29]. where n is the number of data points. The sensitivity analysis was carried out on the model by varying each of input parameters, one at a time, at a constant rate of 20%, while the other input parameters were maintained. The greater the variation that is observed in the output means that greater sensitivity is presented with respect to the input value. The sensitivity analysis results are shown in Fig 8. Results show that the exposure time, w/c ratio and sulfate concentration were found to be sensitive to the predicted compressive strength. Within these factors, the exposure time was the most sensitive factor that affects the predicted compressive strength.

Training of ANN.
A comparative study between the SVM model and ANN model was carried out. An ANN is composed of many artificial neurons which are linked together via network of weights and biases, carrying the output of one neuron as input to another neuron. The training procedure of ANN is consisted of finding the optimum values of these weights and biases. One of the most useful algorithms for training a multilayer perceptron neural network is BP algorithm [30][31][32][33]. This method calculates the error between the network outputs and desired targets and propagates back to the network through a learning mechanism. As a result, the weights and biases (thresholds) are updated until the network reaches a predefined performance goal.
In this study, a BP algorithm was used to establish the ANN model. The whole operation is demonstrated in Fig 9. It can be divided into six steps.
Step 1: Input training factors. Ten influencing factors are inputted into the model; Step 2: Hidden nodes calculate the output. This is a quite complex process that detailed calculated algorithm is invisible; Step 3: Output nodes calculate outputs.
Step 4: Comparison of the outputs with targets and figure out the difference; Step 5: Adjust the model parameters on the basis of training rule using the results of  Table 5. For the ANN model, the R 2 of training data and testing data were 0.9982 and 0.9975, respectively. The MAE of training data and testing data were less than 5.2 MPa. The MAPE of training data and testing data were less than 5.9%. The RMSE of training data and testing data were less than 6.6 MPa. These results indicate that the ANN model is also in the good prediction of concrete compressive strength. The R 2 is lower, while the RMSE, MAE and MAPE are higher for ANN model compared to the SVM model, indicating that the performance of the SVM model is better than Approach for predicting the compressive strength of cement-based materials ANN model. The reason is that BP algorithm was used in this study to find suboptimal solutions being trapped in local minimums. Moreover, the number of hidden layer neurons is estimated by the trial and error procedure, and number of neurons in input layer is equal to the number of input variables. Thus, the training iterations may force ANN model to over train, and then affect the predicting capabilities. But for the SVM model, which objective is to construct a hyper plane that lies ''close" to as many of the data points as possible, it can achieve good generalization ability by minimizing the regularized risk function as the main parameters c and g were obtained by a K-fold crossover algorithm.

Prediction performance for corroded samples
The SVM model was further verified by testing it using the experimental data of the compressive strength of cement-based materials deteriorated in sulfate and seawater. A group of    potentially used as an effective method to predict the compressive strength of cement-based materials in harsh environment. The performance of these predicted compressive strengths is different, the reason is that, besides the exposure time, w/c ratio and sulfate ions, the predicted compressive strength of Concrete-A 5N would also be affected by aggregate content and magnesium ions content.

Conclusions
In this study, a prediction model of mortar compressive strength was established by SVM. A total of 638 sample data collected from the experimental test were used to develop the SVM model for predicting compressive strength. The SVM model was first calibrated and then verified using the experimental data from corroded concrete samples. Conclusions can be drawn as follows: 1. The compressive strength values of all mortar specimens increased and then decreased gradually when the specimens were degraded by sodium sulfate solutions. The experimental results show that, after degraded by the same concentration of solution, the greater the w/c ratios, the larger the descent rate of compressive strength.
2. The sensitivity analysis results show that the main factors influencing the prediction of mortar compressive strength were exposure time, w/c ratio and sulfate concentration.
3. The predicted compressive strengths from the developed SVM model matched well with the experimental values, indicating that the SVM model can be potentially used to for predicting the compressive strength of cement-based materials servicing in harsh environments. Compared to the ANN model, the performance of SVM model is better.