High efficiency classification of children with autism spectrum disorder

Autism spectrum disorder (ASD) is a wide-ranging collection of developmental diseases with varying symptoms and degrees of disability. Currently, ASD is diagnosed mainly with psychometric tools, often unable to provide an early and reliable diagnosis. Recently, biochemical methods are being explored as a means to meet the latter need. For example, an increased predisposition to ASD has been associated with abnormalities of metabolites in folate-dependent one carbon metabolism (FOCM) and transsulfuration (TS). Multiple metabolites in the FOCM/TS pathways have been measured, and statistical analysis tools employed to identify certain metabolites that are closely related to ASD. The prime difficulty in such biochemical studies comes from (i) inefficient determination of which metabolites are most important and (ii) understanding how these metabolites are collectively related to ASD. This paper presents a new method based on scores produced in Support Vector Machine (SVM) modeling combined with High Dimensional Model Representation (HDMR) sensitivity analysis. The new method effectively and efficiently identifies the key causative metabolites in FOCM/TS pathways, ranks their importance, and discovers their independent and correlative action patterns upon ASD. Such information is valuable not only for providing a foundation for a pathological interpretation but also for potentially providing an early, reliable diagnosis ideally leading to a subsequent comprehensive treatment of ASD. With only tens of SVM model runs, the new method can identify the combinations of the most important metabolites in the FOCM/TS pathways that lead to ASD. Previous efforts to find these metabolites required hundreds of thousands of model runs with the same data.


Introduction
Autism Spectrum Disorder (ASD) is a serious developmental disease that is characterized by difficulty in socializing, communicating, and interacting with others. According to research done by the Center for Disease Control and Prevention, in 2000, an average of 1 in every 150, while in 2012 about 1 in every 68 (*1.5%) American children were diagnosed with autism [1]. Some symptoms of ASD are not evident until age two or later. In other cases, a child may appear to be developing normally until age two, and then may stop learning new skills, or may even forget old skills [2]. Psychometric tools are often used to diagnose ASD. The Childhood a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 Cys.-Gly.
x 23 fCystine/fCysteine combinations of up to 6 members selected from the 24 metabolites, and evaluation of all these combinations requires running FDA 190,050 times, which is very computationally demanding. If the number of variables (metabolites here) is large (e.g., hundreds or thousands often occur in various biochemical data sets), testing for all possible combinations is infeasible; 2) the wrapper method provides no information about which members of the identified metabolites contribute the most in classification of ASD; and 3) the procedure does not reveal the relationship between the identified metabolites, i.e., whether they contribute independently or correlatively. The problems posed above are a general challenge in machine learning: feature (metabolite here) selection, prioritization and correlation identification. Note that in the remainder of the paper, we will interchangeably use the words: metabolites, features, or (input) variables to avoid confusion or maintain terminology that is standardly used with various algorithms in the paper.
Feature selection is the process of finding a small subset of significant variables that have good classification performance. Feature selection algorithms may be conveniently grouped into two categories: filter and wrapper methods. In contrast to wrapper methods, filter methods rely on the application of an univariate criterion to each feature separately in order to select the important feature subsets without running the chosen learning algorithm [11].
For example, the t-test is a filter method most commonly used for feature selection and prioritization [12]. In this case, the data are separated into two sets according to their grouping, e.g., "NEU" and "ASD". Then, a comparison of the two data sets for each feature, x i , by t-test is performed under the null hypothesis that the two data sets for x i are drawn from the same normal distribution. The p-value obtained in the t-test for each x i is used as an univariate criterion of how effective x i is at separating groups. The larger the p-value for x i , the less effective x i is. In this fashion, the magnitudes of the p-value for all features inversely define their prioritization order. An appropriate threshold for the p-value needs to be set for feature selection. If the p-value is close to zero (e.g., less than the threshold 0.05 or 0.01), the null hypothesis is rejected, i.e., the two data sets of x i may not come from the same distribution, and x i is most probably a causative feature. Otherwise, x i is not a causative feature and can be removed. However, for experimental and clinical data, the assumption of a normal distribution may not be valid. Furthermore, if the sample size is small, the comparison of two data set distributions is often not reliable; in this case the t-test may not give a correct answer.
Compared to wrapper methods, filter methods are simple and fast as they do not need to run a learning algorithm. Moreover, filter methods treat each feature separately, the number of features does not have an influence on its performance, and thus filter methods can handle very high dimensional systems.
In this paper we propose a new two stage method based on two univariate criteria: (i) the sensitivity index, main effectŜ i ði ¼ 1; 2; :::; nÞ, defined by the variance-based method [13][14][15][16], and (ii) the sensitivity structural (independent) index S a i ði ¼ 1; 2; :::; nÞ, deduced from Structural and Correlative Sensitivity Analysis (SCSA) based on HDMR [17,18]. This dual analysis method can treat data of small size and with an arbitrary probability distribution. The proposed method is a special filter method based on the Support Vector Machine (SVM) learning algorithm. For illustration of the proposed method, the same ASD data used by Howsmon et al. [8] will be treated by this new two stage method.
BothŜ i and S a i are positive quantities. The larger the value ofŜ i or S a i , then x i is concluded to be more important. Thus, the magnitudes ofŜ i or S a i lead to a prioritization order for input variables. Moreover, the additional correlative sensitivity index S b i from SCSA discovers the correlative action patterns of the identified metabolites upon ASD. With only tens of SVM model runs, the new method identifies the combinations of the most important metabolites in the FOCM/TS pathways that lead to ASD. In contrast, to find these metabolites, Howsmon et al. performed hundreds of thousands of FDA model runs. Furthermore, the information about the importance order and the correlative action patterns of the identified metabolites for ASD predisposition provides valuable additional insight for a deeper understanding of ASD mechanism and a possible future path to its treatment. The newly introduced analysis tools are general and should be applicable for other diseases requiring a like analysis to reveal their biological origins.

Methods
To understand the two stage method for feature selection, prioritization and correlation identification, we need some knowledge of HDMR sensitivity analysis.
Here the principles of HDMR sensitivity analysis are briefly summarized. The details can be found in references [17][18][19][20]. We will consider sensitivity indexes of a continuous output y with respect to the input variables x = (x 1 , x 2 , . . ., where f is the function for x ! y. In classification, the output y is a categorical variable representing labels, like [NEU, ASD] or equivalently [−1, 1]. In this case, sensitivity analysis utilizing proposed two indexes,Ŝ i and S a i , cannot be readily performed. Fortunately, in many classification learning algorithms (e.g., FDA and SVM), the implicit output y actually is a continuous variable referred to as score. Fig 1 gives the output score of the SVM model with a linear kernel for the ASD-NEU data using all the 24 metabolites as input variables x.
The explicit output of SVM is given by the sign function of the score, i.e., Using the SVM classification score as the continuous output, then the classification problem may be treated by regression (Support Vector Regression (SVR)) [21], and sensitivity analysis can be readily performed. Fig 2 plots the relation between the score of the SVM model with all 24 metabolites as input variables x versus variable x 24 (% oxidized glutathione) in ASD-NEU data.
Following this procedure we may identify the most important feature as the one whose variation has the largest influence on the variation of the output score.

Sensitivity index: Main effectŜ i
The sensitivity index, main effectŜ i is a commonly used measure for ranking the importance of input variables, defined by the variance-based method as [13][14][15][16] where V x i and E x À i denote the conditional variance and conditional expectation operators with respect to x i and x −i = (x 1 , x 2 , . . ., x i−1 , x i+1 , . . ., x n ) T , respectively, and Vðf ðxÞÞ denotes the unconditional variance of the output. The output y = f(x) is a continuous variable given by the function f(x).Ŝ i reflects the portion of the output variance caused by the variation of the input variable x i . The larger the value ofŜ i , then the more x i contributes to the variance of the output. Thus,Ŝ i is a well established univariate criterion to rank the importance of input variables and can be used for feature selection and prioritization. The determination ofŜ i by the traditional variance-based methods above is quite computationally demanding and requires a large number (thousands or more) of specifically designed samples based on assumed knowledge of the probability distribution of the input variables [14-16, 22, 23]. However, for experimental and clinical data, the input probability distributions are often explicitly unknown. Therefore, traditional variance-based methods cannot be used to treat the latter types of data. A new algorithm to estimateŜ i from a limited number of experimental or clinical samples has been developed without requiring explicit knowledge of the probability distribution of the input variables [19].
First, the variable x i is transformed to a new independent variable z i uniformly distributed in [0, 1] by the Rosenblatt transformation [24] where F denotes the cumulative distribution function (cdf). Many numerical methods have been developed for empirical determination of a cdf from the data [25][26][27]. Matlab has a code ecdf for this purpose. As z i is an independent variable, the first order HDMR component function f i (z i ) for z i can then be determined as [19] where " y is the mean value of the output y for all samples. This procedure is equivalent to determining f i (z i ) by least squares regression from z i and all outputs f(z i , x −i ) at the same value of z i . Fig 3 gives the least squares regression for f 24 (z 24 ) with respect to z 24 in the ASD-NEU data.
As z i is a function of x i only, the main effect,Ŝ i , for x i can be estimated as [19] S i % where z ðsÞ i and y (s) are the values of z i and y in the sth sample; N is the total number of samples used to determineŜ i .
SinceŜ i is determined separately for each x i , the number of input variables does not have an influence on its determination. Thus,Ŝ i can be used to treat vary high (e.g., thousands and more) dimensional systems. Still a shortcoming of usingŜ i is that it contains the contributions from other x j 's correlated to x i [19]. When the correlation is positive (negative),Ŝ i is larger (smaller) than the independent contribution of x i . The resulting ordering of the features from S i then may incorrectly represent the independent contributions of features. Therefore,Ŝ i will be used first in the two stage method for feature pre-selection to remove the most insignificant input variables which is especially important to perform when the number of input variables is large.

Determination of SCSA indexes
SCSA is based on HDMR with independent and/or correlated input variables [17]. A newly developed svr-based HDMR algorithm with independent input variables and known probability distributions of inputs is efficient for HDMR modeling and sensitivity analysis with a modest number of samples [20]. As the variables in experimental and clinical data are often correlated and their probability distributions are explicitly unknown, here, we extend the above svr-based HDMR algorithm to correlated variables. As shown below, without the knowledge of the variable probability distribution the first order HDMR expansion with correlated variables still can be constructed from experimental and clinical data, and will be used to determine the first order SCSA indexes for ASD-NEU data.
HDMR and SCSA. Many problems in science and engineering reduce to the need for efficiently and functionally describing the relationship between a set of high dimensional system input variables x = (x 1 , . . ., x n ) T and the system output y = f(x). As the contributions of the multiple input variables upon the output can act individually and interactively, it is natural to express the (explicitly known or unknown) function f(x) as a finite hierarchical expansion [18]: where u is a subset in {1, 2, . . ., n} including the empty set ;, (i.e., f ; (x ; ) = f 0 ) and x u are the elements of x whose indexes are in u. For simplicity, in sequel we will write u n in place of u {1, 2, . . ., n}. When the component functions satisfy the hierarchical orthogonality condition (i.e., they are optimally defined to maximize the contribution of low order component functions), the above expansion is referred to as an HDMR expansion. For many systems, the higher order HDMR component functions are negligible, and f(x) can be approximated by a low (e.g., first or second) order HDMR expansion. For the ASD-NEU metabolite system in this paper, the first order HDMR expansion was found to give a very good approximation, where f 0 is a constant representing the mean contribution of all input variables, and f i (x i ) represents the contribution of x i to the output. Based on a covariance decomposition of the unconditional variance of the output, a general global sensitivity analysis for independent and correlated variables, referred as structural (independent) and correlative sensitivity analysis (SCSA) was proposed [17,18].
where Cov(Á) denotes covariance, and the property of zero expectation Eðf u ðx u ÞÞ ¼ 0 for the HDMR component functions was used. The SCSA sensitivity indexes are defined by normalization, i.e., by dividing both sides of Eq (9) with Vðf ðxÞÞ.
Here, for x u we denote S a u as the structural (independent) contribution (i.e., related to f u (x u ) and the marginal probability density function (pdf), p u (x u ), only), S b u as the correlative contribution (i.e., related to f u (x u ), other component functions f v (x v )'s and the joint pdf, p(x)) and S u as the total contribution equal to Especially, for u = {i}, we have the first order SCSA indexes for variable x i . We can also consider the correlation of each pair of variables by computing ðy ðsÞ À " yÞ 2 ; i; j 2 f1; 2; :::; ng: ð13Þ Note that SCSA separates the independent and correlative contributions of the input variable x i . In particular, S a i is referred to as the structural index giving the independent contribution of input variable x i upon the variation of y, while the S b i index gives the correlative contribution of x i arising from all other variables, x j 's, correlated with x i . Hence, S a i is an ideal univariate criterion for feature selection and prioritization, and S b i is used for correlation identification. S b (ij) can be considered as a nonlinear correlation coefficient for variables x i and x j . S i is referred to as the total index. When the output is satisfactorily approximated by the first order HDMR expansion, the sum of all total indexes should satisfy [18] The closer ∑ i S i is to 1, then the first order sensitivity analysis is deemed more reliable. Furthermore, the first order HDMR component f i (x i ) as a function of x i provides the influence pattern of x i upon the output y.
The advantage of the first order SCSA indexes, S a i ; S b i ; S i , lies in their ability to perform feature selection, prioritization and correlation identification to good accuracy even with small size data samples (e.g., in the present case, there are 169 points of ASD-NEU data). SCSA requires the construction of an HDMR model utilizing all input variables, and thus it is more reliable thanŜ i or the t-test which only employ information for each input variable separately. This situation becomes significant when the sample size is small.
A shortcoming of performing SCSA is that it is difficult to treat very high dimensional systems because construction of an HDMR model with thousands of input variables is computationally intensive. Therefore, as remarked earlier,Ŝ i may be used first for feature pre-selection when the number of features is large, then followed by SCSA for refined feature selection, prioritization and correlation identification. If the number of features is not large, the feature preselection byŜ i may be avoided.
svr-based HDMR algorithm with independent input variables. The function f(x) is approximated by SVR as [21] where K(x (s) , x) is referred to as a kernel. When an HDMR kernel, i.e., an HDMR expansion of kernels with different numbers of variables [20] Kðx where c ! 0, and the kernels K i ðx ðsÞ i ; x i Þ satisfy the zero expectation and mutual orthogonality conditions is used in Eq (16), the svr-based HDMR expansion is obtained This result shows that all non-constant HDMR component functions are represented as linear combinations of one variable kernels or products of some one variable kernels with combination coefficients α s and a Ã s . Thus, an HDMR model can be constructed by using an SVR algorithm, i.e., the determination of the unknown parameters α s and a Ã s , which is more efficient when fewer samples are adequate. This method is referred to as the svr-based HDMR algorithm. The key step is the construction of HDMR kernels satisfying the zero expectation and mutual orthogonality conditions. Various (polynomial, radial basis, exponential, Fourier) analytical HDMR kernels for independent variables with a known probability distribution of the variables have been constructed [20].
Extension of svr-based HDMR to correlated input variables. In most realistic experimental and clinical circumstances, the variables are correlated. For example, the occurrence of one causative metabolite is often accompanied by the occurrence of other causative metabolites. Moreover, the input variable probability distribution for experimental and clinical data is often explicitly unknown, and hence the analytical HDMR kernels cannot be constructed. Therefore, we need to extend the svr-based HDMR algorithm with independent input variables to correlated input variables without requiring knowledge of the input variable probability distribution. For the first order svr-based HDMR with correlated variables we only need to construct single variate HDMR kernels without requiring the knowledge of the pdf of input variables.
The HDMR kernel satisfies the property: zero expectation and its general formula is given by [28] To construct HDMR kernels, we need to know the pdf, p i (x i )'s, for determination of the integrals in Eq (19). For real data, the pdf is likely correlated and unknown, but implicitly involved in the sampled data x (s) (s = 1, 2, . . ., N) because the samples are drawn according to their probability distribution function. Thus, without explicit knowledge of the pdf, we can construct the one-variate HDMR kernel numerically. Hence, the integrals in Eq (19)  : According to Monte Carlo integration, the first integral (19) can be approximated as the average value of all the elements of the kth row in the above matrix. In the same fashion, the average value of all the elements of the kth column of the matrix is an approximation to the second integral in the numerator. The integral in the denominator of Eq (19) can be approximated as the average value of all the elements in the matrix.
As the above matrix is symmetric, the average values of all rows are just the average values of all columns. Therefore, the one-variate HDMR kernels can be constructed numerically from the N average values of the rows and the average value of all elements of the matrix no matter whether the variables are independent or correlated and regardless of the pdf they possess.
For an arbitrary x i , the integrals in the numerator can be obtained by interpolation from the N average values for the x ðsÞ i 's. Using the one-variate HDMR kernels with correlated input variables, the first order HDMR expansion with correlated variables can be constructed by determining the parameters α and α Ã with the SVR algorithm. The SCSA indexes are then computed from the resultant HDMR component functions, and used for sensitivity analysis.
The procedure of the two stage method. The procedure for feature selection, prioritization and correlation identification for ASD-NEU classification by the new proposed two stage method is as follows: • First, construct an SVM model with a properly chosen kernel (in the present ASD-NEU data, a linear kernel was found to be satisfactory) using all (24) metabolites as the input variables x whose values are normalized to [−1, 1], and the output y takes the values −1 and 1 to represent NEU and ASD, respectively. Then, collect the SVM scores for all 159 samples (76 normal children and 83 patients).
• For the present ASD-NEU data, the total number of 24 metabolites is not large, and one can directly use SCSA without feature pre-selection byŜ i . However, for illustration of the general two stage method we will still useŜ i pre-selection first to remove some of the most unimportant metabolites defined as those with the smallestŜ i values. Then, the first order svr-based HDMR expansion with all retained x i 's is constructed, and the corresponding first order SCSA indexes S a i ; S b i ; S i are computed to perform a final refined stage of feature selection, prioritization, and correlation identification.
• The selection of significant metabolites is performed by a bottom-up method, i.e., removing the most insignificant metabolites first with the smallest values ofŜ i and then with the smallest values of S a i , step-by-step, to obtain the final significant metabolites. There is no strict threshold for the magnitude ofŜ i or S a i to remove insignificant metabolites. The guiding rule is that removal of metabolites does not significantly reduce the classification accuracy of the new SVM model with the retained metabolites. The reason for using the bottom-up method is that the importance order of metabolites obtained from eitherŜ i or S a i depends on SVM scores, which is a function of all the metabolites used in the SVM construction. We found that the order of the top significant metabolites is generally contaminated by including less-informative metabolites, and the result is not the same as their true order when only significant metabolites are used in SVM model construction. However, even if less-informative metabolites are included in SVM model construction, we may still reasonably assume that the metabolites with the smallest values of S i or S a i are the most insignificant and can be removed. As mentioned above, the smallestŜ i does not necessarily mean that x i is the most insignificant metabolite if the smallest value of S i is obtained due to the negative correlations of x i with other x j 's. However, in this case, removing x i may result a significant reduction of classification accuracy of the new SVM model. If so, we keep x i and remove the x j with the next smallest value ofŜ j right above theŜ i .
• After removing some identified possibly insignificant metabolites, a new SVM model is constructed with the remaining metabolites, and the new SVM scores are collected to perform the above procedure again for further metabolite removal. The process continues until no more metabolite can be removed without significant reduction in accuracy of the next SVM model. The removal order of the removed metabolites is the inverse of their importance order, i.e., the earliest removed metabolites are deemed the least significant. Thus, using the inverse of the removal order of metabolites, the importance order of removed metabolites can be obtained.
• The final remaining metabolites are the most significant, and the values of S a i give their importance order based on their independent contributions. The index S b i provides the correlative contribution of x i and all other variables x j 's correlated with x i , and S b (ij) gives the pairwise correlation between x i and x j . The sum of S i indicates the reliability of the overall sensitivity analysis. The first order HDMR component function f i (x i ) plotted versus x i provides the influence pattern of x i to the predisposition of ASD.

Feature pre-selection utilizing the main effectŜ i
The total number of 24 metabolites is not large, and we could directly perform SCSA. However, we still use the main effectŜ i to remove the most unimportant metabolites first as an illustration of a general two stage procedure, especially valuable in situations having large numbers of input variables.  Table 1; the particular metabolites of interest in these figures will be specified in the text discussion, as called for.   Table 1 may be removed. We tested the removal of more, i.e., removing the last 3 or 4 metabolites (i.e., starting from the 15th or 16th position to the end inŜ i sequence), but removing x 5 (% DNA methylation) in the 15th position caused a significant reduction of accuracy of the new SVM model, so only the last three metabolites (x 2 (SAM), x 3 (SAH), x 21 (fCystine)) were removed, resulting in a new subset of 15 metabolites. This overall procedure illustrates the means for systematic pre-selection of the likely significant metabolites utilizingŜ i .

Feature selection and prioritization by the SCSA index S a i
We transferred from theŜ i -based feature pre-selection procedure to now use S a i ; S b i ; S i for more refined feature selection, prioritization and correlation identification. Using the score of the SVM model as the output y and the 15 metabolites pre-selected byŜ i as the input variables, the first order HDMR expansion was constructed by the svr-based HDMR algorithm with correlated variables, and the corresponding first order SCSA indexes S a i ; S b i ; S i were computed. The 15 S a i 's are arranged in decreasing order of their magnitudes in Fig 6.  From Fig 6, we see that the last few S a i 's are very small suggesting that their corresponding metabolites may be removed. Removal of metabolites should be carefully performed such that the accuracy of the new SVM model with the retained metabolites is not significantly influenced. In this way, significant metabolites will not be mistakenly removed. A reliable procedure is removal of one metabolite at a time, but it may not be efficient when the number of metabolites is large. In the present case with just 15 metabolites, we did remove insignificant metabolites one-by-one. Thus, in seven separate sequential steps, x 14 (GSSG), x 13 (fGSH), x 12 (tGSH), x 15 (fGSH/GSSG), x 4 (SAM/SAH), x 9 (Cysteine) and x 1 (Methionine) were removed to obtain a set of 8 metabolites. For brevity, the detailed results of the seven steps are not given here. Only the SCSA indexes arranged in decreasing order of S a i for the subsequent few steps in going from 8 to 6 remaining metabolites are given below in Tables 3-5.  From Tables 3-5, we see that x 18 (Nitrotyrosine) and x 16 (tGSH/GSSG) were removed sequentially. The importance order of the six metabolites in Table 5 is different from that in  Tables 3 and 4. As discussed before, the order of the top significant metabolites is likely contaminated by the less-informative metabolites included in the SVM model construction; the true order is only obtained when the actually significant metabolites are used in SVM model construction. This is why the bottom-up method is utilized to correctly obtain the most important metabolites.
According to the values of the S a i 's given in Table 5, x 6 (8-OHG) could be considered as another candidate for removal. Upon doing so, the accuracy of SVM model with the first 5 metabolites in Table 5 significantly decreased. The prediction error of the SVM model is represented by the mean classification error (MCE), the misclassifications for both ASD and NEU.

MCE ¼
Number of misclassifications total number of data : The MCE's of the SVM models with 4 to 8 metabolites obtained from all 159 samples with 10-fold cross-validation are given in Table 6, which shows that a significant increase of SVM prediction error occurs starting from 5 metabolites. This implies that x 6 (8-OHG) is still a causative metabolite for ASD and we have conservatively retained it. Thus, the six metabolites in Table 5 are chosen as the final significant metabolites. As explained earlier, their importance order defined by their independent contribution is determined by their S a i values. As remarked before, the bottom-up method of feature removal gives the importance order inversely for all removed metabolites at each step. Thus, combining the importance order for the identified six significant metabolites given in Table 5 with the importance order of all removed metabolites, we obtain the overall prioritization order for all the 24 metabolites in Table 7.
The metabolites close to the end of this prioritization sequence are the least informative to ASD. IfŜ i were not used for feature pre-selection, and only S a i were employed, then the prioritization order of the metabolites at the last several positions might be different, but this difference is unimportant because these metabolites are all insignificant no matter what order they have.
The differences between the S a i values for the most important 6 metabolites given in Table 5 are not very large, especially, for the middle 4 metabolites where the difference of neighbor metabolites is only *0.01. Such a small difference may be real or caused by the small sample size (159 samples), experimental errors (measurements of the metabolites) and numerical errors (construction of the SVM and HDMR models). Therefore, their order may not be very significant, and possibly some of them are almost equally important.
Ignoring the precise orderings, the resulting best 7 and 6 metabolite subsets from Table 7 are exactly the same as those given in Table 2 reported by Howsmon et al. [8] However, to obtain these optimal sets of metabolites Howsmon et al. performed an enormous number of Fisher discrimination classification tests, but we obtained the same results with only a few steps. Note that the first 5 metabolites in the sequence given in Table 7 is different from that given by Howsmon et al. As shown below, our selection of the most important 5 metabolites provides better accuracy in ASD classification than that selected by Howsmon et al. This result implies that the new method more reliably identifies the significant metabolites.
To test the validity of our selections, the prediction error of the SVM models with the best combination of k(= 5-8) metabolites chosen by the first k metabolites in Table 7 was computed. For comparison, the prediction error of the SVM model with the 5 metabolites chosen by Howsmon et al. is also included.
To determine the prediction error, Howsmon et al. used leave-one-out cross-validation [8]. However, the leave-one-out algorithm may be a good method to construct a model with a limited amount of data, but may not be the best algorithm to estimate the prediction error because every datum is still involved in the model construction anyway. So we use a different method. All 159 data points are randomly separated into training (100 points) and testing (59 points) data subsets. The SVM model with a linear kernel for a given subset of metabolites was constructed only from the training data with 10-fold cross-validation, which was then used to predict the testing data not involved in the model construction. We ran these tests 100 times and calculated the mean and standard deviation of the 100 MCE's for the training and testing data of SVM models with different numbers of metabolites. The results are given in Table 8.
In Table 8, the mean MCE is less than 2% and 4% for training and testing data (corresponding to an average of 2 and 2.36 misclassifications) respectively when the SVM model contains 6 and more metabolites. Therefore, the prediction accuracy is larger than 98% and 96% for training and testing data respectively, which are satisfactory. The results in Table 8 show that the MCE decreases when the number of metabolites increases. The MCE's for the 5 metabolites selected by our method and Howsmon et al. are very close, but our selection has a little better accuracy.

Correlation identification by the SCSA indexes S b i and S b (ij)
Correlation of metabolites means simultaneous variations or variation dependency of metabolites, which may be caused by different sources such as correlated metabolites are on the same path leading to predisposition of ASD; correlated metabolites can have either a promotion or inhibition effect on each other. The doctors or researchers working on ASD may find correct interpretation of these correlations. Discovery of correlations is important for a pathological interpretation and comprehensive treatment of ASD. The matrix S b (ij)(i, j = 1, 2, . . ., n) gives all the information of metabolite pairwise correlation (see Eq (13)). The S b (ij) matrix for the 6 important metabolites is given below where the metabolite order is x 5 , x 24 , x 10 , x 17 , x 23 , x 6 given in Table 5, : ð21Þ Note that the diagonal elements are S a i 's; and the sum of all elements except the diagonal one in the ith row (or column) of the S b (ij) matrix is just S b i , i.e., the sum of all pairwise correlative contributions of x i with all other x j 's; and the sum of all elements in the ith row (or column) of the S b (ij) matrix is S i . Table 5 shows that (i) the magnitudes of all six S b i are comparable to the magnitudes of corresponding S a i , and (ii) , the total contribution of all six metabolites is composed of half independent and half correlated contributions). Hence, all six significant metabolites are strongly correlated to one another, and ASD predisposition is not caused or represented by individual but rather synergistic effects of the abnormality due to the six metabolites. Especially, x 24 (% oxidized glutathione, row 2 in the S b (ij) matrix) and x 17 (Chlorotyrosine, row 4 in the S b (ij) matrix) have the largest values 0.1224 and 0.1027 for S b i , respectively (see Table 5), showing that within the six metabolites, they are most correlated to other metabolites. The off-diagonal elements in the S b (i, j) matrix give the pairwise correlation between x i and x j . Consider two pairwise correlations: the (6, 4)-and (1, 3)-entry of the S b (ij) matrix, i.e., S b (x 6 , x 17 ) = 0.0166 and S b (x 5 , x 10 ) = −0.0066. Fig 7 plots the relationship between the measurements of x 17 versus x 6 and x 10 versus x 5 , along with the linear fitting of their correlations (since the linear kernel is used in the SVM modeling for the present ASD-NEU data, the correlation here is linear; in general case, the correlation may be non-linear).
When x 6 increases, x 17 has a tendency to increase while when x 5 increases, x 10 has a tendency to decrease consistent in both cases with the sign of their S b (ij)'s. Thus, the magnitudes and signs of S b (x 6 , x 17 ) and S b (x 5 , x 10

Influence patterns of metabolites upon ASD predisposition
The first order HDMR component functions provide information about the influence pattern of metabolites upon ASD predisposition.  As setting ASD = 1 and NEU = −1 in SVM classification, respectively, Fig 8 shows that larger values of x 6 (8-OHG), x 17 (Chlorotyrosine), x 23 (fCystine/fCysteine) and x 24 (% oxidized glutathione) but smaller values of x 5 (% DNA methylation) and x 10 (Glu.-Cys.) imply stronger ASD predisposition. Thus, the former metabolites might enhance, but the latter metabolites might inhibit ASD predisposition. This information is valuable for aiding a pathological interpretation of the influence of the metabolites on ASD.

Conclusion
The discovery of the metabolite abnormalities in FOCM/TS pathways that have effects upon increasing or decreasing ASD predisposition is a significant advance in understanding ASD. The identification of which metabolites are relevant to ASD and how they either inhibit or enhance ASD is important to discern. In this paper, we present a new method that utilizes the scores produced in SVM modeling combined with HDMR sensitivity analysis to effectively and efficiently identify causative metabolites in FOCM/TS pathways, rank their importance, and discover their independent and correlative action patterns upon ASD. We expect that such information will not only be important for a pathological interpretation but also for early diagnosis and ideally providing a path leading to a comprehensive treatment of ASD. These prospects and analyses will most surely benefit from additional metabolite data, and this paper serves the purpose to provide an efficient means of extracting such information even with increasing numbers of metabolites for assessment. The new method, with only tens model runs, can identify the best combination of metabolites in FOCM/TS pathways leading to ASD, in comparison to a previous analysis method requiring hundreds of thousands model runs [8]. The same method introduced in the present paper may be useful for different types of biochemical applications and in other areas of data.