Analysis of the Importance of Oxides and Clays in Cd, Cr, Cu, Ni, Pb and Zn Adsorption and Retention with Regression Trees

This study determines the influence of the different soil components and of the cation-exchange capacity on the adsorption and retention of different heavy metals: cadmium, chromium, copper, nickel, lead and zinc. In order to do so, regression models were created through decision trees and the importance of soil components was assessed. Used variables were: humified organic matter, specific cation-exchange capacity, percentages of sand and silt, proportions of Mn, Fe and Al oxides and hematite, and the proportion of quartz, plagioclase and mica, and the proportions of the different clays: kaolinite, vermiculite, gibbsite and chlorite. The most important components in the obtained models were vermiculite and gibbsite, especially for the adsorption of cadmium and zinc, while clays were less relevant. Oxides are less important than clays, especially for the adsorption of chromium and lead and the retention of chromium, copper and lead.


Rationale and background
Heavy metal contamination id defined as excessive contamination from a series of metal elements, most of which are potentially toxic for the organism in a given ecosystem [1]. As for the formal definition of these elements, there is not a clear criterion, and there are numerous definitions and come classifications [2].
Although contamination by these elements affects all the levels in an ecosystem [3,4], the microorganisms in the soil are the first to be harmed, from bacteria to protozoa. That is why these organism show differences in their adaptation to said contamination [5].
Within these adaptations, bacteria are the group showing the greater variability, given that the number of possible responses depends on their metabolic complexity and ranges from passive adsorption in the cell wall from active expulsion of heavy metals with energy waste [6]. Some of these mechanisms, such as passive adsorption in the wall cell, also happen in higher microorganism, such as heavy-metal-tolerant fungus (Aspergillus niger and Penicilium sp tolerating Cr and Cd [7]. In other cases, they use other mechanisms, such as certain modified fatty a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 acids, which gives them the capacity to resist high concentrations of heavy metals from industrial waste, for example [8]. Lastly, in the case of protozoa, they have different mechanisms to repel and resist heavy metal, depending on whether they are autotrophic or heterotrophic species. Protozoa of genus Euglena, such as species Euglena gracilis, show both types of nourishing. It is in these cases where it can be observed that heterotrophic stages or species show detoxification methods based on glutathione, while autotrophic species or stages show metal expulsion mechanisms similar to those found in plants, such as phytochelatins synthesis (phytochelatins being the substances that accumulate heavy metals, preventing them from exert their toxic action) [9,10].
In plants, heavy metals and other trace metals, such as aluminium, can affect their photosynthetic efficiency and vegetative growth, making it slower and less vigorous and triggering premature ageing [11,12].
For this reason, plants also have defense mechanism against heavy metals, such as passive adsorption [13] or the productions of heavy-metal chelating substances (phytochelatins) [14]. Cotransport with other ions, such as sodium, also exists, and in some cases there may be resistance to certain metals, such as Ni, but hypersensitivity to others, such as Pb [15]. Lastly, in some cases there are changes to the expression of the whole genome, there being in some cases specific genes that are expressed only at the moment when particular metal (e.g. aluminium) is found in the soil [16].
So far in this paper, we have discussed mainly the dynamics of cadmium, chromium and nickel. Other heavy metals (included in this study) show the following effects: • Copper affects mostly plants' leaves. It is very abundant in nature, both in its elemental form and as minerals and salts, being a component of certain rocks, such as serpentinites [17].
• Nickel is found in various types of rocks and salts, and it is a widely use mineral in metallurgy. It is also a micronutrient for plants, although it is toxic for them in much lower concentrations than it is for animals and humans [18].
• The majority of the lead present in the soils comes from anthropogenic emissions, especially when organic additives containing this metal were used in petrol, although it is also naturally found in certain types of soils. This material is toxic for humans, especially for children, and comes from various sources such as dust, water, air or food. As for plants, it is accumulated especially in the roots [19].
• Zinc is an essential metal for both plants and animals, but it can be harmful for both in high concentrations [18,20].
Heavy metals can concentrate in the food chain (soil to plant and plant to consumer, directly or through animals) thus becoming a threat to human health [21].
For all these reasons, decontamination methods are being studied and implemented, including physical methods like mechanical separation [22], electrokinetic remediation [23], chemical washing (ex-situ soil washing or in-situ soil flushing), soil amendments, including lime or other natural materials, chelating agents, nanomaterials or biological products [24,25].
Other methods to decontaminate soils which are contaminated by heavy metals are those called remediation, which use the microorganisms in the soil to decontaminate it [26] or phytoremediation, that is the use of plants to extract the heavy metals, including the use of hyperaccumulator plants, and the different processes called phytostabilisation and phytoextraction [27,28]. These methods do not have the same environmental impacts as the others. Finally, active research in the use of transgenic plants especially tailored to extract heavy metals is producing interesting results [29].
Nowadays, new analysis methods to prevent contamination by heavy metals are being researched, especially for agricultural practices, such as soil fertilization [30], but this is also gaining strength in other fields, such as decorative plants [31].
Nonetheless, the soil is the place where the behaviour of heavy metals in an ecosystem at the bioavailability level can be predicted. The different mineral and organic components of the soil can be a barrier for heavy metals by means of the adsorption and retention mechanisms [17,32].
Aim of the study and source of the data The aim of this study is to Rank the different components of the soil according to their relevance coefficient in a regression tree model, remarkably oxides and clays.
As for the source of the data used, they were obtained from different soils in Galicia (NW Spain), including samples of soils which, due to their composition, are naturally contaminated by heavy metals, given that high concentrations of certain metals can interfere in the adsorption and retention of others [33].

Characteristics of the sample
The data analysed in this study were gathered by Covelo EF for her thesis [17] from 14 soils with three samples each (42 samples). These soils characteristics are listed in Table 1. No permission was needed for sampling in those locations.
In all of them, the following tests were performed, according to the protocols by [35].
• Determining the soil organic matter through wet assessment with potassium dichromate.
• Determining the soil texture by separating the different soil components (sand, clay, etc.) through wet and dry sieving.
• Determining the mineral identification and properties through techniques such as X-ray diffraction.
• Determining the soil exchange capacity.
As a result of these analyses, the following variables were determined Table 2. As a result, the variables described in Table 2 were recorded together with their most relevant descriptive value. Certain components were observed to be absent from Galician soils, while the most adsorbed and retained metals are those which are competitively stronger [17].
Following these analyses, the adsorption isotherms for each soil and metal were drawn, and K d or the distribution coefficient, and K dS , or the accumulated distribution coefficient, were calculated. These are defined as the quotient of the metal concentration in the soil in micromoles and that existing in the solution multiplied by 10 −3 . In the experiments where the K d was determined, a metal solution with a concentration of 100 μmol L -1 was used, while the concentration for K dS varied from 5 to 400 μmol L -1 .
In this way, the adsorption and retention isotherms for the different metals are determined, and it is possible to work with these coefficients [17,33,36,37,38]. However, an analysis of the results showed they were irregular, and so these data were translated into the adsorption and retention percentages used in this study, which are referenced in Table 2. The comparison between the results obtained by her in [32, 33] with those obtained through a more accurate model validation and selection method is relevant. At first, the possibility to consider Fe oxides and Al oxides as a single component was considered, given the existence of evidence regarding their colinearity [17,33,36,37,38], but the trees obtained were similar to those in [32, 33], so it was decided to treat them separately.

Election of the statistical technique
In order to achieve the aforementioned aims, regression models were built showing the importance of the different soil components regarding the adsorption and retention capacity for the different heavy metals. Regression trees (CART) were used to built those models [40], mainly due to the fact that they are very flexible models, enabling to reproduce complex non-linear relationships between the independent variables (soil components) and the dependent variables (retention and adsorption). In this regard, linear models cannot be deemed, a priori, sufficient to capture the relationships between both groups of variables, and their use would imply unjustified restrictions to the form of said relationships.
The aforementioned advantage of CART is also shared by other machine learning techniques, such as neuronal networks or support vector machines [40], or other techniques that usually fall within non-parametric statistics, such as generalized additive models [41]. Nonetheless, trees have the following comparative advantages: • They are very interpretable models, due to their graphic representation and to the fact that they can be described in terms of a natural language close to the researcher.
• Despite being non-linear models, where it is always difficult to analyse the influence of independent variables on the answer, regression trees allow us to perform and intuitive analysis of the importance of these variables from their behaviour during the process of building the tree.
• Their estimation (learning, in machine learning terminology) entails a perfectly assumable computational cost (the estimation of a model with a dozen independent variables and one dependant variable, using several hundreds of data, is performed in seconds), quite lower than that required by the other aforementioned techniques.
As a drawback, like any other machine learning or no-parametric techniques, their great adjustment capacity makes it necessary to control the complexity of the model in order to prevent data overfitting. In order to do that, model selection techniques are required, which were based on cross-validation methods in this study.
Regression trees. Regression trees (CART) are machine learning techniques to model a regression problem. They are of a non-parametric nature, for they do not postulate a priori a pre-fixed parametric model of the relationship between the explanatory variables and the answer, but rather adapt their structure to the relationship structure which is present in the data sample. In this regard, they are very flexible, since they can capture continuous non-linear relationships with a random complexity, but, for this same reason, they have a great capacity to adjust to the data, which makes it necessary to use techniques to control their complexity and prevent their natural tendency to overfit.
Given a data sample fðx i ; y i Þg n i¼1 where x i 2 X & R d and y i 2 Y 2 R; i ¼ 1; ::; n, X is the input space and Y is the output space, the regression tree responds to the following model Whereĉ j 2 R and 1 A j ðxÞ is the indicative function assigning value 1 if x2 A j and zero otherwise. Thus, the regression tree splits the input space X into different regions A j which are exhaustive and exclusive, and assigns the valueĉ j to all the individuals in each region A j .
The estimation problem of regression trees lies in determining the split fA j g h j¼1 (and the number h of regions), as well as coefficientsĉ j of the expression (1) above. In this study, this estimation has been carried out in the following stages: • Estimation of the maximum-complexity tree • Selection of the model.
Later, the importance of each of the variables was analysed. Each of these stages are briefly described below.
• Estimation of the maximum-complexity tree: The aim of this stage is to develop a maximum-complexity tree from which different lower-complexity trees are to be built. The most suitable from among these will be selected in the following model selection stage. The maximum complexity tree is built by making successive splits (branches) of the input space. Each split is defined by a condition of type x j < a where x j is one of the variables of the variable vector x 2 X & R d . The selection of the variable x j and of the cut-off value a is done in such a way that the least squares criterion is minimized: where c k , k = 1, 2 is the average of the response variable in the individuals in each one of the two sets (leaves) of the splitting defined by x j < a. The splitting process continues by applying the aforementioned method to each of the regions obtained up to that moment, although a series of stop criteria are reach which are aimed at avoiding excessively complex trees. A usual criterion is to have a lower limit to the number of individuals in each split (3 data in this study). Otherwise, the maximum tree would end up building a split which as many regions as individuals in the sample.
• Selection of the model: Once the maximum tree has been built, the different intermediate trees with decreasing complexity are built from it through the removal of sections (pruning). The optimal tree is selected from among these through an m-fold cross-validation method, as described below: 1. The sample is split into m groups of a similar size.
2. The maximum tree is built with the data from m-1 groups. Their prediction error is assessed in the data from the group that has been put aside, which have not been used by the algorithm.
3. The number of sections producing a smaller quadratic prediction error in all the groups that have been being put aside from the estimation.

4.
A tree is estimated with this optimal number of sections, using all the data in the sample.
Once the model has been estimated in step 4, its goodness of fit is assessed by calculating the coefficient of determination R 2 , by means of the expression: wheref ðx i Þ is the prediction of the model for the i-th observation, i = 1,. . .,n.
• Importance of the variables: Each iteration of the successive splitting process implied in the estimation of the maximum-complexity tree results in the selection of the best variable to make the splitting. Nonetheless, it may be the case that other variables that are not selected for the different optimal splittings cold make good splittings (although not better than the chosen ones) and were never selected. For example, a variable which is always the second best for all the splittings could be the most important variable but would be "overshadowed" by those finally selected in the different splittings.
For this reason, a method to determine the importance of the different variables [39] is based on determining the sum of quadratic error decreases ΔR for each j-th variable that its use in each splitting t would yield, that is to say, the value ∑ t ΔR(j, t), that is to say, the sum of all the benefits yielded by the use of the variable in each node.
All the process of estimation, selection of the model and assessment of the importance of its variables was carried out using R environment [42]. The estimation of the trees was performed by means of the package Recursive splitting (rpart), [43] and the assessment of the predictive capacity was developed and tailored in said environment.

Results and Discussion
In this section, the trees obtained for the retention and adsorption of each of the heavy metals are shown, as well as the analysis of the importance of the soil features in each on said models.
In Table 3, the goodness of fit of the different models in terms of the coefficient of determination R 2 obtained as described in the previous section is shown.
Considering the values of R2 included in Table 3, it can be established that the different goodness of fit values in the improved model are better than when considered separately, which means that, in every instance, the trees obtained are closer to the reality than those obtained by Covelo et al. [32,33].

Cadmium
The regression model for this metal works better for the adsorption process, with a goodness of fit of 98.27% of the total variation, while the percentage falls to 95.26% for retention.
In the regression tree for cadmium adsorption, as it can be seen in Fig 1, the first splitting is done according to the Mn oxides, while vermiculite appears in the second splitting. However, the component appearing in three of the splittings is humified organic matter and the percentage of sand, the latter being less relevant. Despite humified organic matter having so much importance in the adsorption of cadmium, this metal is not one of the most adsorbed metals in soils with higher humified organic matter content a (humic umbrisols), which mainly adsorb chromium [44]. However, in the case of the coefficients of importance, the most important component for this model is the percentage of sand. As for clays and non-crystallized oxides, the most important ones are chlorite and Fe oxides, respectively. In the case of clays, chlorite is followed by kaolinite, gibbsite and vermiculite, the latter being the least important of all components. Although oxides, in this case, are not the most important components in the adsorption of this metal, their specific area makes them highly efficient in the adsorption process, especially Fe oxides, although this has to be in low concentrations, given that their adsorption capacity decreases with the increase in concentration [45].
In the case of cadmium retention, as it can be seen in Fig 2, the first splitting is done again according to Mn oxides, followed in the second one by the proportion of quartz, silt and vermiculite and the percentage of sand. In general, this tree has a simpler structure than in the case of adsorption, so fewer components are needed for the regression [40]. On the other hand, humified organic matter shows little relevance in this tree, and so it is inferred that this component is less important in the retention of this metal. Contributions by Fe and Cr oxides are not significant either, regarding both relevance and the changes to the tree structure or the relevance of the different variables, these components are not relevant [46, 32, 33].
As for the coefficients of importance for retention, it can be seen that the coefficient of importance of Mn oxides decreases regarding that of Fe or Al oxides in the adsorption process. In the case of clays, chlorite is the one showing the maximum value, while gibbsite is second in importance, shortly followed by kaolinite and vermiculite. In this case, as it was for adsorption, the percentage of sand is the most important component, so according to this model, the immobilization of this metal depends more on the oxides than it does on the clays. If working with coefficients of variation, the result applying main components would be the opposite [37] in the case that hematite was the most important component.

Chromium
This metal is found naturally in some rocks, such as serpentinites, whose soils were sampled for this study and some previous studies [32,33,36,37] where the regression tree model works similarly for adsorption and retention, with an R 2 of 98.30% and 98.37%, respectively.
In the modelling of chromium adsorption using regression tress, as shown in Fig 3, the first splitting is done according to the proportion of quartz in the soil, the second one being the percentage of sand. As for oxides, these do not appear in the optimal tree model even when Fe and Al oxides are considered as forming a complex, being taken as one only variable in the model, while clays are represented by kaolinite only, which also appears in [32, 33], so its role is more of a supporting role.
As for the coefficients of importance, the components with the highest values are quartz and humified organic matter, the conclusions by [47] not being that important, this being the most important variable, while Mn oxides show a much lower importance. In this model, the percentage of sand shows a lower importance than humified organic matter, so the adsorption of this metal depends less on the texture that it had been expected by other authors, although it coincides with the description by [32,33].
Regarding the importance of clays, gibbsite is the most important one, followed by chlorite, kaolinite, and vermiculite. This protagonism of humified organic matter is due to the fact that these are the components which are coadjuvant to the adsorption action of quartz, given that this material is chemically inert [48]. Fig 4 shows the optimal tree for chromium retention following regression trees. In this case, oxides do not appear in the optimal explanatory model, but kaolinite does appear representing the clays. In the preceding analysis [32, 33], clays did not appear, but Fe oxides did. Nonetheless, now quartz is most explanatory, followed by the percentage of sand and humified organic matter, which partially matches the preceding model. The biggest difference between the model by [32] and the present model are the higher relevance of oxides (especially Fe oxides). The explanatory tree is simpler, given that the regression steps are few or that dramatic differences exist between one tree and the following [40].
In the analysis of the coefficients of importance for retention, a situation similar to that of the adsorption is observed, with quartz and the percentage of sand as the most important components. Humified organic matter loses importance, the importance of Mn oxides is null, and the importance of Fe and Al oxides is similar to that of the adsorption process, but less than gibbsite, which is the most importance clay, followed by chlorite, vermiculite and kaolinite. The importance of other components can also be seen, such as hematite, which, like M oxides, is not important for the retention of this metal. Complexes can be formed with different components, such as those with organic matter and clays [49].

Copper
The regression tree model for this metal shows a goodness of fit of 99.33% for adsorption and 99.43% for retention.  Humified organic matter appears together with chlorite, so it can be said that this clay is modified by said component, in a situation similar to the one described in [49].
In this case, the most important component in copper adsorption is the percentage of sand, followed by humified organic matter and the proportion of quartz. As for clays, the most important one is kaolinite, followed by vermiculite, gibbsite and, lastly, kaolinite. As for oxides, Mn oxides are more important than Fe and Al oxides, these oxides show an adsorption curve or isotherm for copper of a hyperbolic type, its maximum concentration being much higher than that of other metals, such as nickel or zinc [50]. Fig 6 shows the regression tree for copper retention, being the variable that makes the first splitting the proportion of hematite, followed by cation-exchange capacity (CEC) and the proportion of humified organic matter. As for clays, they are not relevant in this case or for [33], updated by [51]. The same happens to amorphous oxides. Thus, retention is more dependent on the pH conditions and redox potential of the soil, so minerals are usually clays or hydrated, The highest coefficient of importance corresponds to CEC, followed by the percentage of sand and humified organic matter. As for clays, kaolinite is the most important one, followed by gibbsite, chlorite and, lastly, vermiculite, with the lowest coefficient of importance. In the case of Fe and Al oxides, they show a high affinity to this metal, although their importance in Galician soils is low, when they are found a structural complex adsorbing and retaining other types of metals, such as cobalt and lead [53].

Nickel
For the adsorption and retention of nickel, the regression tree model works slightly better for the retention, with an R 2 of 98.47%, while the goodness of fit for the adsorption process is 97.77% of the total variation.
The component found in the first splitting for nickel adsorption optimal regression tree (Fig 7) is the percentage of silt, followed by the percentage of sand and Fe oxides. Hematite is more relevant than it is in the model obtained by [32], where the formation of a structural complex between Fe and Al oxides is not considered and there Al oxides are relevant, being the only representative amorphous oxides. In the case of clays, they are not relevant for either model, which tells us that texture factors are more easily fitted to the nickel adsorption by the fitting method of regression trees [39].
Another way to perform the analyses is to consider the fraction or organic-mineral component rather than humified organic matter. In this case, the component in the same splitting does not vary, since it is still the percentage of silt but the tree shows less levels of splitting [46].
Regarding the coefficients of importance in this case it can be seen that the highest coefficient of importance is shown by the percentage of silt, followed by the proportions of humified organic matter and quartz. As for clays, gibbsite is the most important one, followed by kaolinite, vermiculite and chlorite. Oxides have a low importance, being a little higher for Mn oxides and it being possible to form complexes with several types of oxides [54] although clays can also be modified by oxides [52].
Nickel retention tree can be seen in Fig 8. The first splitting is done according to the values of the proportion of hematite, followed by specific cation-exchange capacity and the percentage of sand. Clays do not appear, gibbsite being the only cay with certain relevance in the case of adsorption [32]. As for amorphous and non-hydrated oxides, they are not represented. The tree is completed with a splitting of humified organic matter, which tells us that nickel retention shows an action mechanism where hydrated components or components with a high capacity to adsorb water are preponderant, together with a certain texture. Therefore, a good soil with a good nickel retention would be a soil formed in a place with high rainfall or humidity in the atmosphere, not necessarily linked to an aquatic ecosystem [55].
Regarding the coefficients of importance, silt is the component with the highest coefficient of importance, followed by the proportion of mica and the percentage of sand. As for the importance of clays and oxides, their importance is not very high in general. The most important clay is kaolinite, while for the oxides the highest importance is for Fe oxides. Thus, this retention mainly occurs in mica and it is very dependent on the size of the grain rather than on the degree of hydration or the degree of crystallization of the different components [56].

Lead
The goodness of fit for lead is similar for adsorption (99.83%) and retention (99.76%). In the case of adsorption, the first splitting is done depending on the specific cation-exchange capacity, while humified organic matter and Mn oxides are found on a second level. As for clays, only vermiculite appears, although gibbsite, kaolinite and vermiculite appear in [32]. It is thus postulated that, in the adsorption of this metal, several types of components add their effects as if they were one only component [49].
Regarding the analysis of the coefficients of importance, humified organic matter has the second highest coefficient of importance, after the percentage of sand, and followed by chlorite and CEC. As for clays, chlorite is the most important one, a long distance from gibbsite, kaolinite and vermiculite, which is the least important clay. This adsorption depends on the pH conditions and redox potential of the soil, which determine the specific cation-exchange capacity [17]. Adsorption occurs more in clays than in oxides or humified organic matter. These differences in the adsorption can be explained by their different crystallographic structures [57].
In the retention process (Fig 10), the first splitting is also done by CEC, followed by humified organic matter (replacing the percentage of sand proposed in the model by [32] and Mn oxides, which are the only significant oxides. As for clays, vermiculite appears, but his process is less dependent on the proportion of these compounds than adsorption [48].
As for the coefficients of importance, the first three components are the same as in the case of adsorption, but the third position is for the percentage of silt rather than chlorite, even though the importance of the different oxides and clays is similar, and thus the same comment is applicable to both processes, in this case the importance of chlorite and humified organic matter cannot be said to be similar, and so they are not assumed to be found a complex as the one described in [49].

Zinc
The regression model for zinc works better for the adsorption, with a goodness of fit of 93.60% of the total variance, while the percentage for adsorption falls to 88.48% of said variation.
Zinc adsorption optimal tree is shown in Fig 11. The percentage of silt is the component making the first splitting, also appearing in later splittings. Vermiculite and plagioclase are at a second level. Gibbsite and vermiculite are the only clays represented. In the case of amorphous and non-hydrated oxides, they are not representative, so the adsorption of this metal is made preferentially in clays and hydrated metals such as hematite. In the case of clays, when coefficients of equilibrium are used, the adsorption shows a linear isotherm l [58].
As for the coefficients of importance, CEC is the most important variable for zinc adsorption, followed by quartz and chlorite, which is the most important clay, followed by gibbsite and vermiculite, kaolinite being the least important clay. In the case of oxides, Fe and Al oxides are most important than Mn oxides. In this model, the importance of the percentage of silt and chlorite is similar to that of quartz, which tells us that these three materials can form an adsorbent complex, just like Fe and Al oxides or, in some cases, Al and Mn oxides and hydroxides [54].
In the optimal zinc retention tree (Fig 12), only the percentage of silt (which is the one making the first splitting) and gibbsite are considered significant, so the model proposed in this tree is similar to those proposed by [32] and [46] using a competitive model. Thus, in the case of the behaviour of zinc in the soil, there are more influencing elements than those initially considered, as in the case when the competence between metals and the formation of the socalled organic-mineral complex are considered [46].
As for the coefficients of importance, quartz is the component with the highest importance, followed by the percentage of silt and the percentage of sand. Gibbsite is the most important clay, followed by chlorite, vermiculite and kaolinite in this order. Regarding the oxides, Fe and Al oxides show a higher coefficient of importance than Mn oxides. This situation is comparable to the one described in [59], where oxides are mixed with silicates, and also with the study by [47], where oxides show an amorphous structure, the aim of the study being to assess the behaviour of these oxides in the adsorption of chromium,  Table 4.
The following results are obtained related to the coefficients of importance, as shown in Table 4: • Chlorite is the most important clay for the adsorption of cadmium, followed by kaolinite and gibbsite, vermiculite being the least important clay, although with much lower values than the most important components, such as the percentage of sand, CEC and humified organic matter, although in previous studies, such as [32], this variable was observed not to be among those which adsorb the most cadmium. As or oxides, Fe oxides are the most importance, with a value of the coefficient of importance similar to that of Al oxides but not equal, as found in [17].
• Gibbsite is the most important clay for the adsorption on chromium, followed by chlorite and kaolinite. As for gibbsite, this clay shows a coefficient of importance similar to that of  • Chlorite is the most important clay in the adsorption of lead, followed by gibbsite. As for oxides, Mn oxides are the most important, confirming the findings by [32], where these oxides, CEC and the composition of the clay fraction of the soil are the components which best explain the adsorption of this metal.
• In the adsorption of zinc, chlorite is the most important clay, followed by gibbsite. As for oxides, Fe oxides are the most important, while Mn oxides are not important for the adsorption of this metal. Co-adsorption with copper is not observed either, given that the variables explaining both processes are different [38].
As for the retention of the different metals, the results are shown in Table 5.
In the retention of the metals, related to the importance of the variables (Table 5), it can be seen that: • Just like for adsorption, chlorite is the most important clay in the retention of cadmium, having lower important values than the most important variables, such as the percentage of sand and CEC. As for oxides, Fe and Al oxides have a similar importance, which does not coincide with the description by [32], where clays show a higher importance and Mn oxides are the most relevant oxides. • Kaolinite is the most important clay for copper retention, just like in the case of adsorption.
As for oxides, they show a lower importance, and Mn oxides are not importance, so their affinity with copper is low, although they can be associated with clays for a higher efficiency [62].
• For nickel retention, kaolinite is the most important clay, followed by gibbsite. As for oxides, Fe oxides are the most important ones, although their importance is lower than the coefficient of kaolinite, which means clays predict the retention of this metal better than oxides [32].
• In lead retention, the most important clay is gibbsite. The contribution of oxides is not very important, although they are important for [32]. Hematite content is more important, also appearing in [46].
• In zinc retention, gibbsite is the most important clay, while Fe oxides are the most important oxides when these compounds are considered, although they do not appear in the optimal tree. This contribution by clays, together with the fact that the percentage of silt is the most important variable for Zn retention, coincides with the descriptions by [32,46].
With these techniques, it is possible to study how it affects the different components of an ecosystem depending on its characteristics [63], where it is described how the use of multivariate techniques (not considered as Machine Learning) enables to know how the migration of inorganic pollution, both metallic and non-metallic, occurs.
From a broader perspective, these techniques enable us to know what the most serious contamination problems are and to select the best bioremediation method, if it is possible. This was the case with the selection of the best plant species for the restoration of the dump in a manganese mine, where different aspects, such as the concentration of heavy metals, were studied. Apart from manganese, cadmium, copper and zinc were mainly found. After studying the accumulation of heavy metals, it was found that the best species for revegetation were Cynodon dactilon and Humulus scandens [64].

Conclusions
As general conclusions, the following can be mentioned: Regression tree models (CART) allowed ranking the relative importance of the different soil components according to their relevance coefficients, on the adsorption and retention of 6 metals using 14 different soils. The cross-validation method allowed the improving of the models in terms of R 2 in all metals retention and absorption, except in the case of Zn retention, in which there was no change.
The behaviour of the different adsorption and retention models for the 6 studied metals are different, with the first variable splitting the regression tree being quite variable.
The metals can be sorted out in two groups. One of them includes Cd, Ni and Zn, which generally show a high importance of the texture components such as the percentage of sand. The other group, including Cr, Cu and Pb, shows that the exchange capacity, humified organic matter and the proportion of hematite are the most important variables, as it happens in [32,46].
Lastly, in some cases, there are cases of co-adsorption of several metals or of to components having a joint adsorption, as it happens with gibbsite in the adsorption of chromium [60]. As for the co-adsorption of different metals, the analysis does not detect the co-adsorption of copper and zinc proposed by [38].
Supporting Information S1 Table. Original data from 42 soils. (PDF)