Initial assessment of the infant with neonatal cholestasis—Is this biliary atresia?

Introduction Optimizing outcome in biliary atresia (BA) requires timely diagnosis. Cholestasis is a presenting feature of BA, as well as other diagnoses (Non-BA). Identification of clinical features of neonatal cholestasis that would expedite decisions to pursue subsequent invasive testing to correctly diagnose or exclude BA would enhance outcomes. The analytical goal was to develop a predictive model for BA using data available at initial presentation. Methods Infants at presentation with neonatal cholestasis (direct/conjugated bilirubin >2 mg/dl [34.2 μM]) were enrolled prior to surgical exploration in a prospective observational multi-centered study (PROBE–NCT00061828). Clinical features (physical findings, laboratory results, gallbladder sonography) at enrollment were analyzed. Initially, 19 features were selected as candidate predictors. Two approaches were used to build models for diagnosis prediction: a hierarchical classification and regression decision tree (CART) and a logistic regression model using a stepwise selection strategy. Results In PROBE April 2004-February 2014, 401 infants met criteria for BA and 259 for Non-BA. Univariate analysis identified 13 features that were significantly different between BA and Non-BA. Using a CART predictive model of BA versus Non-BA (significant factors: gamma-glutamyl transpeptidase, acholic stools, weight), the receiver operating characteristic area under the curve (ROC AUC) was 0.83. Twelve percent of BA infants were misclassified as Non-BA; 17% of Non-BA infants were misclassified as BA. Stepwise logistic regression identified seven factors in a predictive model (ROC AUC 0.89). Using this model, a predicted probability of >0.8 (n = 357) yielded an 81% true positive rate for BA; <0.2 (n = 120) yielded an 11% false negative rate. Conclusion Despite the relatively good accuracy of our optimized prediction models, the high precision required for differentiating BA from Non-BA was not achieved. Accurate identification of BA in infants with neonatal cholestasis requires further evaluation, and BA should not be excluded based only on presenting clinical features.


Introduction
Optimizing outcome in biliary atresia (BA) requires timely diagnosis. Cholestasis is a presenting feature of BA, as well as other diagnoses (Non-BA). Identification of clinical features of neonatal cholestasis that would expedite decisions to pursue subsequent invasive testing to correctly diagnose or exclude BA would enhance outcomes. The analytical goal was to develop a predictive model for BA using data available at initial presentation.

Methods
Infants at presentation with neonatal cholestasis (direct/conjugated bilirubin >2 mg/dl [34.2 μM]) were enrolled prior to surgical exploration in a prospective observational multi- PLOS

Introduction
Neonatal cholestasis is a relatively common clinical issue that presents a complex diagnostic challenge for clinicians [1]. Cholestasis may not be readily identified at its onset and, as such, may present late in the course of the underlying disease process. An expansive differential diagnosis underlies the condition, which challenges one to prioritize diagnostic evaluations in order to sort through a complex set of etiologies in a relatively short time [2]. Shotgun approaches to diagnosis are typically not feasible in infants, while identification of life-threatening and treatable causes of cholestasis is a high priority. Newborn screening has the potential to identify some of the relevant disease processes. One of the most important and relatively common specific causes of neonatal cholestasis is biliary atresia (BA). Timely diagnosis of BA is ultimately made by cholangiography at the time of exploratory laparotomy and histologic assessment of the surgically-removed bile duct remnant. Such timely diagnosis has the potential to improve clinical outcomes, as earlier hepatic portoenterostomy is associated with longer survival without liver transplantation [3]. Deciding which infants should undergo surgical exploration is critical. Ideally, one would like to minimize the number of infants who undergo unnecessary surgery, while not missing or delaying the diagnosis of BA. There is no universal consensus on the sequential steps to be taken in the diagnostic evaluation of neonatal cholestasis from the time of presentation leading up to exploratory surgery.
The Childhood Liver Disease Research Network (ChiLDReN), a National Institutes of Health-funded consortium, has conducted a prospective longitudinal study of 875 infants presenting with neonatal cholestasis at 15 clinical sites in the United States and Canada over an 11-year period. Data collected included details of the presenting clinical features, demographics, physical findings, laboratory values, and gallbladder sonography results that are typically available in routine clinical practice. Using these data, the objective of this study was to determine the predictive value for BA of typical testing performed in the evaluation of cholestatic infants prior to the decision for invasive testing (e.g., liver biopsy, cholangiography, exploratory laparotomy). A secondary goal was to develop a diagnostic algorithm to help guide the clinician's decision-making for invasive testing.

Study population
Between April 2004 and February 2014, infants presenting with neonatal cholestasis were enrolled in a prospective observational study of infants with cholestasis (PROBE: https:// clinicaltrials.gov/ct2/show/NCT00061828, conducted by ChiLDReN). Written informed consent was obtained from the study participants' parents or guardians, and the protocol was carried out under institutional review board (IRB) approval. Given the age of the participants, assent was not feasible. The IRB at each participating institution has approved PROBE (S1 Table). Inclusion criteria were: 1) age 180 days at presentation to a ChiLDReN center; and 2) serum direct or conjugated bilirubin >20% of total bilirubin (TB) and !2mg/dl. The PROBE protocol permitted the use of laboratory studies drawn prior to enrollment ("presentation") to be used for inclusion criteria. Exclusion criteria were: 1) acute liver failure; 2) previous hepatobiliary surgery; 3) bacterial or fungal sepsis; 4) hypoxia, shock, or ischemic hepatopathy; 5) malignancy; 6) primary hemolytic disease; 7) drug or total parenteral nutrition-associated cholestasis; 8) extracorporeal membrane oxygenation (ECMO)-associated cholestasis; or 9) birth weight <1500g in an infant who did not have BA. Presenting clinical features (including stool color), demographics, physical findings, laboratory data, and gallbladder sonography findings were collected prospectively and recorded prior to the ultimate assignment of a clinical diagnosis. Evaluations of neonatal cholestasis were not prescribed and were according to local practice and conducted at local facilities.
Not all participants enrolled in PROBE were included in this analysis of predictors of BA. Participants were included only if they had laboratory studies indicating direct/conjugated hyperbilirubinemia that were performed at the time of "presentation" to the ChiLDReN clinical site. Inclusion in the BA cohort (Group 1) for this analysis required either the performance of a biliary drainage procedure for BA or exploratory surgery with the finding of an atretic extrahepatic bile duct by either inspection or attempted cholangiography. BA could not be definitively "confirmed" in infants who presented "late" in the clinical course and in whom clinicians determined that laparotomy or laparoscopy would not benefit the child or alter management. Inclusion in the Non-BA cohort (Group 2) required the identification of a specific alternative etiology for their cholestasis or cholangiography that excluded BA. For an infant with the clinical diagnosis of idiopathic neonatal hepatitis (INH) or idiopathic cholestasis (IC) to be included in this analysis, resolution of cholestasis was required as defined by a subsequent TB <1.0 mg/dL at >120 days of age (without hepatic portoenterostomy). INH was defined as neonatal cholestasis in which histologic evidence of giant cell hepatitis was present on liver biopsy and for whom no other etiology was confirmed. IC was defined as neonatal cholestasis that resolved in an infant who did not undergo liver biopsy or did not have giant cell hepatitis on a liver biopsy, and for whom no other etiology was confirmed. The outcome variable for this study is a confirmed study definition meeting diagnosis of BA or Non-BA (i.e., Group 1 vs. Group 2).

Candidate predictors
Twenty-two variables collected at the time of the first evaluation at the ChiLDReN center were considered as candidate predictors, including age at disease onset and first evaluation, sex, race, ethnicity, anthropometrics (weight z-score, height z-score, head circumference z-score), palpable liver (including number of centimeters below the costal margin at the midclavicular line), palpable spleen, acholic stools, Alagille "syndromic" facial features, serum TB (defined as conjugated + unconjugated when total not measured), conjugated/direct bilirubin, alanine aminotransferase (ALT), aspartate aminotransferase (AST), alkaline phosphatase (ALP), gamma-glutamyl transpeptidase (GGTP), albumin, platelet count, cholesterol, and gallbladder sonography (presence or absence of the gallbladder, "small" gallbladder equated with presence). Age at first evaluation was defined as the earliest date among dates of study informed consent, diagnosis, or surgery; age at disease onset was defined as the earliest age at which there was caregiver reported icterus of eyes or skin, darkening of urine, or white/pale stools in the initial history case report form.

Statistical analysis
Descriptive statistics for the characteristics listed above were provided for BA and Non-BA subjects included in the model development and those not included (Group 3 = BA not included and Group 4 = Non-BA not included). Differences between Groups 1 and 2 were assessed using two sample t-tests for the continuous parameters. Variables with skewed distributions were analyzed after first applying a log transformation, with the accompanying descriptive statistics reported on the original scale. Categorical variables were assessed using a Chi-Square test or Fisher's exact test, where cell size(s) were 5 participants.

Model development
Two types of model were used to find the best prediction models: a hierarchical classification and regression tree (CART) and a logistic regression model [4]. All 22 factors mentioned above were considered by both approaches, regardless of whether or not they obtained statistical significance in the univariate setting. CART analysis recursively partitions observations to define the optimum cutoff point for continuous predictors and identifies homogeneous groups having the largest difference in the outcome variable (minimum misclassification error rate). Each partition is a binary split based on a single independent variable. This process results in a classification rule with the optimum cut point for continuous variables and is represented as a tree. Once the full tree was grown, a pruning algorithm was run to avoid over-fitting. In the pruning process, the chi-square statistic for 2x2 contingency tables was calculated for each split. Using a pre-selected alpha level (p = 0.10), nodes whose chi-square values-as well as the chi-square values of subsequent splits-did not exceed the predetermined threshold were pruned.
A logistic regression prediction model was constructed using a forward stepwise hierarchical approach, with higher than standard p value, α = 0.10 [5][6][7]. To avoid losing study sample due to missing data, a sequential regression imputation method was used to impute missing values [8]. Only one randomly selected imputed data set was used for model development [9]. To define appropriate transformation of continuous variables, we used penalized-spline functions to explore the potential nonlinear effect of potential continuous predictors [10]. Potential interaction effects identified through CART analysis were considered in the model development process. The final model consists of only variables maintaining a 0.10 level of significance.

Model evaluation
The ability of the multivariate model to correctly classify patients into the dichotomous disease classification (BA vs. Non-BA) was determined by assessing the area under the receiver operating characteristic (ROC) curve (AUC), where larger values on the 0-1 scale indicate greater concordance between the predicted and observed disease groups. Reapplying the model to our data, we further evaluated the disease misclassification rates at what are considered more definitive predicted probability thresholds.
The CART analysis was performed using R (version 3.2.2) software. Data imputation and all other analyses were conducted using SAS (version 9.3) [4].

Results
During the study period, 875 infants with neonatal cholestasis were enrolled in PROBE. Strict criteria for BA and Non-BA inclusion were used in this analysis to increase the confidence for the predictive value of variables tested. Thus, 401 infants (Group 1) met criteria for the study definition of BA; 102 participants were classified clinically as BA by the study site, but after review of laboratory and operative data at presentation, these patients did not meet the strict study definition of BA and were excluded from analysis (Group 3: 58 excluded for lack of laboratory data at presentation and 44 for lack of operative demonstration of BA). Groups 1 and 3 were generally similar, except for a skewing of data to a "late" presentation in Group 3, which likely accounted for the decision to not proceed with hepatic portoenterostomy, thereby excluding those infants from Group 1 (S2 Table).
There were 259 of 372 infants enrolled in PROBE who did not have a clinical diagnosis of BA and met study criteria for Non-BA (Group 2). There were 113 infants (Group 4) with a clinical diagnosis of Non-BA excluded from analysis for potentially more than one reason, including: 1) inability to definitively exclude BA because, despite having a clinical diagnosis of indeterminate/IC, INH, choledochal cyst, or "other", either TB was still elevated (>1 mg/dL) beyond 120 days of age and/or there was no cholangiographic evidence of bile duct patency; 2) laboratory data were not available at presentation; and 3) laboratory data at presentation did not meet PROBE entry criteria. Groups 2 and 4 were similar (S3 Table). The clinical phenotype in Group 4 may have been milder, with less apparent hepatomegaly and lower biochemical markers of liver disease (TB, direct bilirubin, conjugated bilirubin, ALT, and AST).
Univariate analysis identified 13 variables (Table 1), which were significantly different (in bold) between BA and Non-BA (Group 1 vs. Group 2), including age at disease onset, stool color, sex, facial features, weight z-score, length z-score, head circumference z-score, centimeters of liver palpable below the costal margin, palpable spleen, GGTP, albumin, platelet count, and gallbladder sonography. Infants with BA were more likely to have acholic stools, to be female, to be younger at disease onset, have greater z-score growth parameters, have normal facial features, more significant hepatosplenomegaly, a higher GGTP, albumin, and platelet count, and a sonographically absent gallbladder. We used a hierarchical CART analysis to create an algorithm that could distinguish BA from Non-BA. In this approach, the population was segregated into either BA or Non-BA in a stepwise manner based on the single most predictive variable, using a threshold value derived empirically from the observed data. After this initial segregation, each newly-created sub-population was again evaluated using the most predictive variable that was redefined for this new subset of the population. In this manner, the predictive power of each variable was maximized at each step. The process of segregation and reanalysis was continued until there was no further improvement in the overall predictive power for the population. The results of this analysis are shown in Fig 1. If the initial discriminator was a GGTP of 204 IU/L, those with lower levels were unlikely to have BA (40 [21%] out of 193 infants). In those with GGTP !204 IU/L and acholic stools, BA was likely (303 out of 467 infants). Further discrimination was achieved by incorporating weight z-score. Overall, the predictive capacity for this model was somewhat worse than the logistic regression modeling, with an AUC for the ROC of 0.831. When the three-variable CART analysis was utilized, 12% of infants categorized as Non-BA (n = 247) were misclassified and had BA. Conversely, 17.5% of infants categorized as BA (n = 415) were misclassified and did not have BA.
The best logistic regression model selected included nine predictors: sex, acholic stools, normal facial features, ALT, GGTP, age at disease onset, weight z-score, palpable liver below   the costal margin, and a sonographically absent gallbladder, which were associated with a diagnosis of BA (Table 2). Model discriminating ability was assessed by the ROC curve. Larger values on the 0-1 scale indicated a better predictive model. The final model yielded an AUC for the ROC analysis of 0.892 (Fig 2).

Conjugated
If Three-hundred fifty-seven infants had a predicted probability >0.8, of whom 290 had BA (81.2%). Of the 67 remaining Non-BA infants (19%) with a predicted probability of >0.8, 12 had alpha-1 antitrypsin deficiency, and 10 had Alagille syndrome (Table 3). One-hundred  thirty-six infants had a predicted probability of <0.2, of whom 120 had Non-BA (88.2%). Sixteen infants (12%) with scores <0.2 had BA and were evaluated at mean of 63 days of age; most had normally pigmented stools and gallbladder that was present. One-hundred sixtyseven infants had intermediate predicted probability scores between 0.2 and 0.8.

Discussion
The quest for finding clinical and laboratory features that distinguish BA from other causes of neonatal cholestasis has been ongoing for over 50 years [11][12][13][14][15][16][17][18][19][20]. Early investigations of over 800 infants in five separate reports from Boston, Toronto, London, Houston, and Bicêtre demonstrated a difficulty in clinically distinguishing BA from intrahepatic cholestasis in a significant number of infants [11][12][13][14][15]. Infants with BA more frequently had acholic stools, had less failure to thrive, and had more pronounced elevation in biochemical markers of bile duct and canalicular injury, although these features were not uniformly discriminative. More recent reports have added radiologic and histologic features to the investigative paradigm [17][18][19].
Most of these studies have been single or two-center studies and retrospective in nature. The current analysis is based on data obtained in a large, truly multi-centered prospective study, which was particularly rigorous with regard to the study definition of BA and Non-BA and with the application of advanced statistical modeling methods. The purpose of the current study was to attempt to develop a diagnostic algorithm that could distinguish between BA and Non-BA using non-invasive parameters that were typically obtained during initial clinical evaluation of cholestatic infants. An effective algorithm might serve as a guide to physicians as to whether invasive procedures, such as liver biopsy and exploratory laparotomy, are warranted. The three variables in the CART analysis (serum GGTP, acholic stools, and weight zscore) that were statistically derived to achieve the best prediction of BA are simple, mostly objective, and readily available early in the course of the evaluation of cholestasis. Accurate classification of the stool pigmentation is the only somewhat subjective parameter in this algorithm [21]; however, recent simple smartphone technology may overcome this [22]. The predicted probability model that was developed achieved accurate diagnosis of BA in 290 out of 357 cases (81%) when the predictive probability was >0.8. Accuracy in these cases might be enhanced if alpha-1 antitrypsin levels and phenotype were readily available, and if features of Alagille syndrome were carefully assessed. One could argue that, for the infants with a predictive probability of >0.8 who had negative diagnostic testing for alpha-1 antitrypsin deficiency and Alagille syndrome, the next logical step would be exploratory laparotomy, and one might defer liver biopsy. An accurate diagnosis of Non-BA was predicted in 120 of 136 cases (88%) when the predicted probability was <0.2. Conversely, an unsettling number of these infants had BA, whose diagnosis would be delayed or missed if one relied solely on these presenting clinical features to "exclude" BA. In addition, a significant number of infants had intermediate predicted probability scores between 0.2 and 0.8 and could not be classified as either BA or Non-BA.
It is clear from the current detailed analysis that clinicians should be very cautious about either diagnosing or excluding BA on the basis of presenting clinical features in infants with  cholestasis. Family history is typically noninformative, but in selected circumstances can direct investigations toward specific inherited disorders like Alagille syndrome or familial intrahepatic cholestasis. Additional diagnostic investigations are typically warranted, and noninvasive approaches are often the first to be considered [23]. In the current study, only the presence of gallbladder was considered on ultrasonography. More detailed evaluation for the triangular cord sign, gallbladder wall characteristics, and hepatic subcapsular blood flow were not conducted, although may have increased the accuracy of the predictive model [18,[24][25][26]. Hepatobiliary scintigraphy may be especially useful in excluding BA when intestinal excretion of radiotracer is demonstrated, although nonexcretion is less helpful since it is observed in BA and Non-BA [27]. Thus, in 60 of 67 cases where a predictive value of >0.8 erroneously suggested BA, stools were pale or normal; in such infants, hepatobiliary scintigraphy may have been useful. The current analysis did not attempt to determine the added value of liver histology in the predictive algorithm, as the focus was to determine the predictive value of tests performed prior to subjecting infants to invasive testing. Liver histology can be quite informative in the evaluation of neonatal cholestasis, although false negative rates are disturbing given the consequences of late or missed diagnosis of BA [28,29]. In addition, the exposure of infants unnecessarily to anesthesia (for liver biopsy, cholangiography, or laparotomy) has become a relevant issue in light of recent reports of potential long-term neurodevelopmental sequelae of general anesthesia in young children [30]. Clinicians should consider this issue when deciding about diagnostic testing that may require general anesthesia, including liver biopsy and endoscopic, percutaneous, or intraoperative cholangiography.

Conclusions
In conclusion, early accurate diagnosis of BA remains challenging. Clinicians are obliged to categorically exclude BA in the setting of neonatal cholestasis, since failure to make this diagnosis has potentially profound adverse consequences. This rigorous prospective analysis of presenting features in neonatal cholestasis was unable to generate a diagnostic algorithm that yielded sufficient ability to discriminate between BA and Non-BA in all patients. Early referral to a specialist, with consideration for possible liver biopsy or intraoperative cholangiography, needs to be entertained as soon as cholestasis is identified. Caution should be exercised in excluding BA based only on clinical non-invasive features. The identification of an alternative definitive diagnosis makes BA unlikely, although the Kasai hepatoportoenterostomy has been performed mistakenly in some infants with alternative diagnoses, including cystic fibrosis, alpha-1 antitrypsin deficiency, and Alagille syndrome [31][32][33][34][35]. Although not necessary for all infants with neonatal cholestasis, surgical exploration with operative cholangiography and/or pathologic examination of a bile duct remnant remains the only definitive means of making the diagnosis of BA.