Artificial Neural Network Accurately Predicts Hepatitis B Surface Antigen Seroclearance

Background & Aims Hepatitis B surface antigen (HBsAg) seroclearance and seroconversion are regarded as favorable outcomes of chronic hepatitis B (CHB). This study aimed to develop artificial neural networks (ANNs) that could accurately predict HBsAg seroclearance or seroconversion on the basis of available serum variables. Methods Data from 203 untreated, HBeAg-negative CHB patients with spontaneous HBsAg seroclearance (63 with HBsAg seroconversion), and 203 age- and sex-matched HBeAg-negative controls were analyzed. ANNs and logistic regression models (LRMs) were built and tested according to HBsAg seroclearance and seroconversion. Predictive accuracy was assessed with area under the receiver operating characteristic curve (AUROC). Results Serum quantitative HBsAg (qHBsAg) and HBV DNA levels, qHBsAg and HBV DNA reduction were related to HBsAg seroclearance (P<0.001) and were used for ANN/LRM-HBsAg seroclearance building, whereas, qHBsAg reduction was not associated with ANN-HBsAg seroconversion (P = 0.197) and LRM-HBsAg seroconversion was solely based on qHBsAg (P = 0.01). For HBsAg seroclearance, AUROCs of ANN were 0.96, 0.93 and 0.95 for the training, testing and genotype B subgroups respectively. They were significantly higher than those of LRM, qHBsAg and HBV DNA (all P<0.05). Although the performance of ANN-HBsAg seroconversion (AUROC 0.757) was inferior to that for HBsAg seroclearance, it tended to be better than those of LRM, qHBsAg and HBV DNA. Conclusions ANN identifies spontaneous HBsAg seroclearance in HBeAg-negative CHB patients with better accuracy, on the basis of easily available serum data. More useful predictors for HBsAg seroconversion are still needed to be explored in the future.


Introduction
In clinical practice, hepatitis B surface antigen (HBsAg) seroclearance and seroconversion have been recommended as the ideal outcomes in both the natural history of HBV infection and as endpoint for the treatment of CHB [1]. Earlier HBsAg seroclearance or seroconversion is likely resulted in a better prognosis because of lower HBV replication as well as less liver damage [1,2]. A few studies have explored the incidence of spontaneous HBsAg seroclearance in CHB patients of both Asian and European populations using long-term follow-up cohorts and the annual incidence ranges from 0.62% to 2.26% [3,4,5,6,7,8].
Because of the more rarity of spontaneous HBsAg seroconversion, compared to HBsAg seroclearance, the incidence and long-term outcomes of CHB patients experiencing this event remain disputed. Existing evidences indicate that HBsAg seroclearance or seroconversion confers favorable long-term outcomes in patients without hepatocellular carcinoma (HCC) or decompensated liver cirrhosis [9,10,11,12].
Predictive factors for spontaneous HBsAg seroclearance or seroconversion using various parameters have attracted much attention recently. Previous studies had demonstrated that lowering HBV DNA level was an important predictor for spontaneous HBsAg seroclearance [5,6,8,13]. Furthermore, with the technological advances of quantitative HBsAg (qHBsAg), it has been suggested as a promising new marker in monitoring immunological response in both treated and untreated CHB patients, as well as a potential predictor of liver disease progression [14]. Our previous study showed that low qHBsAg levels and an increased reduction rate in qHBsAg levels were the most significant predictors of spontaneous HBsAg seroclearance with 3 years of follow-up [15]. These findings have been further validated by other studies [4,5,13,16,17]. However, our previous study had several limitations. No specific time point was identified where qHBsAg kinetics could have the highest predictive value. Also the accuracy of qHBsAg levels in predicting HBsAg seroclearance [area under receiver operating characteristic curve (AUROC) 0.833] still warrants improvement [15]. In all currently available studies [4,5,13,15,16,17], the predictability of qHBsAg levels for HBsAg seroconversion has not been thoroughly investigated.
Being a complex biological system, the interactions among predictors are multidimensional and non-linear, thus, making it difficult to distinguish between classes when using the conventional linear discriminant analysis or a single predictor. The artificial neural network (ANN) is a novel computer model inspired by the working of the human brain [18]. It consists of a set of highly interconnected processing units (neurons) linked with weighted connections, and includes an input layer, an output layer and one or more hidden layers. The input layer is formed from the different data available for the analysis and the output layer is formed from the different outcomes, whereas, the hidden layers are used to allow complex relations between the input and output layers to evolve. One of the outstanding characteristics of the ANN is that it can develop nonlinear statistical models to deal with complex biological systems [19].
The main aim of the present study was to assess the ability of the ANNs to predict HBsAg seroclearance and seroconversion in a large population of CHB patients spontaneously clearing HBsAg with or without the appearance of anti-HBs and compared ANNs performance to that of conventional logistic regression models (LRMs) as well as previously proven clinical parameters, such as qHBsAg and HBV DNA levels.

Materials and Methods
The composition of the present study cohort has been previously described, and is based on the comparison of CHB patients with spontaneous HBsAg seroclearance, with age-and sex-matched HBeAg-negative controls [15]. The present study was a post-hoc analysis involving the entire cohort of our previous study. In brief, all of the patients were followed up at the Liver Clinic, Department of Medicine, the University of Hong Kong, Queen Mary Hospital regularly for at least 3 years. All patients had HBsAg positivity documented for more than six months and were HBeAg-negative on presentation to our clinic. Upon their first and/or follow-up visits, these patients had given verbal informed consent for the storage of blood samples for further studies.
HBsAg seroclearance or seroconversion was observed in the first group of patients between June 2001 and February 2011; these patients were then followed up regularly until June 2012 for their latest liver biochemistry and HBV serology. HBsAg seroclearance was defined as loss of serum HBsAg with or without the appearance of antibody to HBsAg (anti-HBs), while HBsAg seroconversion was defined as loss of serum HBsAg with the appearance of anti-HBs. These two end-points were confirmed by two samples taken at least six months apart. The control group, recruited between May 2010 and May 2011, was age-and sexmatched with the patient group achieving HBsAg seroclearance. No treatment had been given for all of the patients during the entire follow-up period. Serum samples collected at every visit were stored at 220uC until tested. Serum HBV DNA and qHBsAg levels were performed 3 years, 2 years before HBsAg seroclearance and at time of HBsAg seroclearance (i.e., baseline). The numbers of stored serum available for HBsAg seroclearance or seroconversion group were 203, 190 and 203 at the time points of 3 years, 2 years before and at the time of HBsAg seroclearance respectively. The corresponding numbers of stored serum available for the control groups were 203, 189 and 197.
Serum qHBsAg level was measured by the Elecsys HBsAg II assay (Roche Diagnostics, Gmbh, Mannheim, Germany) [20], with a lower limit of detection of 0.05 IU/mL. Samples with qHBsAg level higher than 52000 IU/mL were retested at a dilution of 1:100, according to the manufacturer's instructions. Serum anti-HBs were measured by Abbott Laboratories (Chicago, Illinois), with a lower limit of detection of 10 mIU/mL. Serum HBV DNA level was measured using the Cobas Taqman assay (Roche Diagnostics, Branchburg, New Jersey), with a lower limit of detection of 20 IU/mL.
One hundred randomly chosen patients with HBsAg seroclearance, followed by 100 age-and sex-matched controls, were chosen for the determination of HBV genotype using the INNOLIPA HBV genotyping assay, which was performed according to the instructions of the manufacturer (Innogenetics, Gent, Belgium).

Ethics Statement
Verbal informed consent was obtained and recorded among all patients upon their first and/or subsequent follow-up visits for the storage of blood samples for further studies. The study was approved by the Institutional Review Board, the University of Hong Kong and West Cluster of Hospital Authority, Hong Kong, including for the retrieval of archived samples for the present study. All clinical investigation was conducted according to the principles expressed by the Declaration of Helsinki, with all data anonymously analyzed.

Statistical Analysis
Categorical variables were reported as the number of cases and percentages; continuous variables were explored for parametric distribution using the Kolmogorov-Smirnov test. For patients with undetectable serum HBV DNA or qHBsAg, the results were taken as the lower limit of detection (20 and 0.05 IU/mL, respectively). As HBV DNA and qHBsAg levels showed a highly skewed distribution, they were log transformed (log10) before the analysis. After transformation, both variables showed a normal distribution (P.0.05). Differences in clinical and laboratory data, related to HBsAg seroclearance or seroconversion, were assessed using the chi-square analysis with Yates correction and the independentsample T-test procedure after Levene's test for equality of variances, as appropriate. A subgroup analysis according to different genotype of HBV was also performed to further test the power of established models.

Development of the artificial neural network
Variables found to be significantly related to HBsAg seroclearance or seroconversion by univariate analyses were used to build the ANNs. Patients were randomly assigned to a training group (70% of total patients) or a testing group (30% of total patients). We built a three layer feed forward neural network with two output neurons. The learning rule used here was back propagation of errors, which adjusts the internal parameters of the network over the repeated training cycles to reduce the overall error [21]. The weight of the connections was also altered between neurons to decrease the overall errors of the network. Training was terminated when the sum of square errors was at a minimum. The activation function, representing the outcomes of ANN, was used with continuous outputs with the interval from 0 to 1, in which 0 = HBsAg non-seroclearance/non-seroconversion, 1 = HBsAg seroclearance/seroconversion. The cut-offs of ANN outputs with the best relationship between sensitivity and specificity were used for classification. The relative weights of the input variables for the ANNs were calculated according to the General Influence Measure method [22]. In this study, we built ANNs by using the graphical neural network development tool NeuroSolution V5.05 (Neurodimension, Gainesville, FL, USA).

Development of the multivariate logistic regression model
In the training group (70% of total patients), variables found to be significantly related to HBsAg seroclearance or seroconversion by univariate analysis entered into two distinct forward conditional multivariate logistic regression models (LRMs). Logistic regression generated the coefficients of a formula to predict a logit transformation of the probability of presence of the characteristic of interest: logit(p) = b 0 + b 1 x 1 + b 2 x 2 + … + b k x k . The probability of presence of the characteristic of interest was obtained by the formula p = 1/(1+e 2logit(p) ) in which 0 = HBsAg non-seroclearance/non-seroconversion, 1 = HBsAg seroclearance/seroconversion. The cut-offs of logistic regression outputs with the best relationship between sensitivity and specificity were adopted for classification.

Assessment of the diagnostic accuracy
The performances of both ANNs and LRMs, as well as qHBsAg and HBV DNA levels, in predicting HBsAg seroclearance or seroconversion in the training group and in three validation groups (testing group, genotype B group, genotype C group) were tested using receiver operating characteristic (ROC) curve analysis and expressed in terms of sensitivity, specificity, positive predictive values (PPV) and likelihood ratios (LR). Youden index was calculated to discriminate the optimal cut-off value. Comparison of ROC curves was obtained using the Hanley-McNeil method [23].
A two-sided P value of,0.05 was considered statistically significant. Statistical analysis and ROC analysis were computed by MedCalc 10.0 software (Mariakerke, Belgium) and SPSS 18.0 software (SPSS Inc, Chicago, IL, USA).

Baseline Characteristic of Patients
Baseline characteristic of the entire study population were outlined in Table 1. The mean age was 48.8610.9 years and patients were predominantly male (70.4%). 63 patients (31.0%) in the HBsAg seroclearance group had developed anti-HBs. There were no significant differences in the distribution of age, gender, alanine aminotransferase (ALT) level, bilirubin and genotype when comparing patients with HBsAg seroclearance with and without seroconversion (all P.0.05). Patients with HBsAg seroclearance or seroconversion had significantly lower serum qHBsAg, HBV DNA levels at baseline (all P,0.001), compared to controls as previously described. [15] Specific characteristics of four groups/subgroups (including training, testing, genotype B and genotype C) were outlined in Table 2, Table S1-S5. There were

Validation in testing group
When the ANNs were evaluated in the testing group, the performance of the ANN in predicting HBsAg seroclearance was very high, with AUROC of 0.929 (95%CI = 0.862-0.969) which

Discussion
HBsAg seroclearance and seroconversion are accepted worldwide as the two most powerful indictors of prognosis in CHB patients, as shown by many studies investigating these topics [2,9,10,11,12,24]. Prejudging or predicting of these features to untreated or treated CHB patients are therefore, highly desirable, as they could help hepatologists in providing optimal therapeutic regimen [1].
In recent years, ANN modeling has been increasingly used in clinical management and disease prognostication, including in the prediction of disease-free survival in HCC patients [25], assessing preoperative HCC grading and micro-vascular invasion [26], and predicting the mortality risk of patients with end-stage liver disease or acute-on-chronic hepatitis B liver failure [27,28]. Due to the three main advantages of ANN, namely self-learning, self-adapting and inference process, the ANN model has been demonstrated to perform better than conventional discriminant analysis in precisely predicting disease outcomes [19]. To date, the complex interaction of the different variables that can be obtained during the natural history of CHB, has not led to any predictive model able to recognize HBsAg seroclearance or seroconversion with sufficient accuracy to be usefully employed as an easy-to-use tool in the clinical setting. In the present study, the ANN was found to be superior to linear discriminant analysis as well as qHBsAg and HBV DNA levels both in the training group and non-inferior to linear discriminant analysis in the testing group, and very reliable in identifying HBsAg seroclearance. The better performance of ANN supported the postulation that HBsAg seroclearance was a complex, multidimensional nonlinear function [18,19]. Our model was able to give a more precise estimate of HBsAg seroclearance on the basis of serum-based data routinely available in the clinical setting.
Our previous study showed low qHBsAg levels and increased rate of qHBsAg decline could predict HBsAg seroclearance [15]. By selecting these two clinical parameters and entering them into building the ANN and LRM, the accuracy of low qHBsAg in predicting HBsAg seroclearance was further increased (AUROC 0.847, 95%CI = 0.797-0.889). Accompanied with qHBsAg level decreasing gradually over times, lower levels of qHBsAg or rapid reduction rate of qHBsAg would eventually lead to HBsAg seroclearance or seroconversion [16]. Another important finding was the HBV DNA level and their reductions, which had previously been considered as powerful predictors for HBsAg seroclearance in pre-qHBsAg era [6,8]. Liu et al. found that decrease in HBV DNA levels was the most important predictor of HBsAg seroclearance [8]. However, the predictability of HBsAg seroclearance increased greatly when they added the qHBsAg level into consideration [13]. In the present study, we compared the combination of the above predictors (ANN and LRM), as well as the separate predictors (qHBsAg and HBV DNA), respectively.   Table 5. Sensitivity, specificity, predictive values and likelihood ratios of models according to optimal cut-off for predicting HBsAg seroclearance. Under these circumstances, use of the present ANN for HBsAg seroclearance, except for ANN for HBsAg seroconversion, could lead to an improvement in diagnostic accuracy and in tailoring the best individual clinical management. The ANN for HBsAg seroconversion (AUROC 0.757) was inferior to that for HBsAg seroclearance. One of the potential reasons was the relatively small sample size (n = 63) which could affect the performance of ANN [22]. Nonetheless, given the rarity of HBsAg seroconversion, it would be difficult to recruit more subjects for a more thorough analysis. Another important reason was due to the lacking of significant predictors besides of the currently-available qHBsAg [24]. A good model for predicting HBsAg seroconversion remains to be discovered. Similarity, in genotype C subgroup, the performance of the ANN in predicting HBsAg seroclearance was not statistically significant and possibly underpowered since genotype C only comprised approximately one-third of the total patient cohort. A validation study concentrating on genotype C patients could be considered in the future.
Our study was limited by ANN being built and tested on a single center cohort, and it could thus be argued that data originating from other centers might lead to different conclusions. However, we believe that this should not be considered as a shortcoming since the distinctive characteristic of the ANN is that it can learn through examples making the prediction of HBsAg seroclearance, even in HBsAg seroconversion, feasible on datasets from other sources.
In conclusion, ANN could accurately predict spontaneous HBsAg seroclearance in HBeAg-negative CHB patients, on the basis of easily available serum data within a shorter period of no more than 3 years. ANN for HBsAg seroclearance was superior to the conventional statistical linear approach and it could be used in predicting the outcome of CHB. The performance of ANN for HBsAg seroclearance can be further improved by including new cases from other centers due to the unique ability of learning of neural networks.

Supporting Information
Table S1 Baseline characteristics of the study population stratified by HBsAg seroclearance subgroups. (DOC)