Two of Them Do It Better: Novel Serum Biomarkers Improve Autoimmune Hepatitis Diagnosis

Background Autoimmune hepatitis (AIH) is a chronic liver disease of unknown aetiology and characterized by continuing hepatocellular inflammation and necrosis. Autoantibodies represent accessible markers to measure the adaptive immune responses in the clinical investigation. Protein microarrays have become an important tool to discriminate the disease state from control groups, even though there is no agreed-upon standard to analyze the results. Results In the present study 15 sera of patients with AIH and 78 healthy donors (HD) have been tested against 1626 proteins by an in house-developed array. Using a Partial Least Squares Discriminant Analysis (PLS-DA) the resulting data interpretation led to the identification of both new and previously identified proteins. Two new proteins AHPA9419 and Chondroadherin precursor (UNQ9419 and CHAD, respectively), and previously identified candidates as well, have been confirmed in a validation phase by DELFIA assay using a new cohort of AIH patients. A receiver operating characteristic analysis was used for the evaluation of biomarker candidates. The sensitivity of each autoantigen in AIH ranged from 65 to 88%; moreover, when the combination of the two new autoantigens was analyzed, the sensitivity increased to 95%. Conclusions Our findings demonstrate that the detection of autoantibodies against the two autoantigens could improve the performance in discriminating AIH patients from control classes and in combination with previously identified autoantigens and they could be used in diagnostic/prognostic markers.

the last few years various techniques have been proposed to solve the problem of imbalance distribution of sample [2]; these approaches are mainly dividing into three categories such as sampling, algorithms and feature selection. The problem can be attenuated by undersampling or oversampling, which produce class-balanced data. This method leads to loss of valuable information owing to reduce sampling of the majority class in the training of classifier. To overcome this issue we proposed to make better use of the majority class through sampling several subsets independently from the majority class, to use these subsets to train classifiers separately and combine the trained classifiers into a final output (panel autoantigens). This approach outperforms better than simple undersampling, since multiple subsets contain more information than single one.
Given these different versions of the original data, it was necessary to evaluate the effect of instance perturbation on the feature selection results [3] because the results tend to be unstable. Indeed, it is now well documented that it is possible to find different feature sets which however produce similar prediction patterns [4]. This characteristic translates into a lack of stability and robustness of the protein expression signatures, making the selection of sets with relevant features for a classification task a critical issue and, at the same time, rendering their biological interpretation challenging.
To address these needs we followed a strategy proposed by Kalousis et al. where has been defined the stability of a feature selection algorithm as the robustness of the "feature preferences" it produces to training set perturbations [5]. Measuring stability requires a similarity measure for feature preferences.
We used the Tanimoto index to evaluate the degree of similarity/dissimilarity among the protein lists [6], which measures the amount of overlap between two sets of arbitrary cardinality. It ranges between 0 meaning no overlap between the two sets and 1 meaning two sets are identical. So, the similarity of each pair of features sets (with R subsets R(R-1)/2 pairs are possible) is computed using the Tanimoto index. More similar all subsets are, higher the similarity measure will be. In this work, two feature selection techniques were considered to perform the stability analysis. Recursive Support Vector Machine (R-SVM) [7] and Partial Least squares Discriminant Analysis (PLS-DA) were chosen as representative of supervised feature selection methods. R-SVM is a modified support vector machine algorithm which performs feature selection while builds the classifier in a multiple-step recursive manner following a given descendant ladder; the details of the method have been described by Zhang et al.; 2006.
Model-based multivariate analysis: For each of the fifty generated dataset a PCA model was fitted and when this outlier was present, it has been removed from the successive analysis. A two-class PLS-DA modeling has, therefore, been based, therefore, on this neatened dataset. A dummy matrix of two Y-variables [8] expressing diagnosis of the sera samples was created. Fifty PLS-DA models were calculated with different subset of HD samples and the values of the parameters (R 2 X, R 2 Y and Q 2 Y) are presented in Supplementary Table S3. These parameters were positive indicating the existence of a robust discriminative pattern between AIH and HD samples. In PLS-DA, the R 2 Y and Q 2 Y parameters were used for the evaluation of the models, indicating the fitness and prediction ability. The explanatory ability of the model increases with the number of the components whereas the predictive ability of the model begin to decrease after a certain number of components.
Statistical model validation: One limitation of cross-validation is that it assess only the predictive power, but it is unknown which Q2 value corresponds to a valid model and if there is a statistically relevant difference between the groups. Therefore, a permutation testing was employed to give a measure of the statistical significance of the diagnostic statistics [9][10][11]. The approach produces a distribution of Q 2 values suitable for testing the null hypothesis for a model's Q 2 ; a reliable model should yield a significantly larger Q 2 value compared to Q 2 generated from random models using the same dataset [12]. For the 50 datasets, we randomly permuted the labels of the response matrix many times (1000) and computed the new classification models; in this way a null reference distribution, that is expected not significant, was obtained for each performance parameter (R 2 Y, Q 2 Y). The results of response permutation testing offered a favorable picture: for the 1000 datasets in which the labels were randomly permuted the distribution of R 2 Y and Q 2 Y was always substantially lower than the corresponding real values. Hence it can be concluded that it is impossible to obtain models with the same predictive values by chance and neither of the models over fit the data.