Whole blood transcriptional profiles as a prognostic tool in complete and incomplete Kawasaki Disease

Background Early identification of children with Kawasaki Disease (KD) is key for timely initiation of intravenous immunoglobulin (IVIG) therapy. However, the diagnosis of the disease remains challenging, especially in children with an incomplete presentation (inKD). Moreover, we currently lack objective tools for identification of non-response (NR) to IVIG. Methods Children with KD were enrolled and samples obtained before IVIG treatment and sequentially at 24 h and 4–6 weeks post-IVIG in a subset of patients. We also enrolled children with other febrile illnesses [adenovirus (AdV); group A streptococcus (GAS)] and healthy controls (HC) for comparative analyses. Blood transcriptional profiles were analyzed to define: a) the cKD and inKD biosignature, b) compare the KD signature with other febrile illnesses and, c) identify biomarkers predictive of clinical outcomes. Results We identified a cKD biosignature (n = 39; HC, n = 16) that was validated in two additional cohorts of children with cKD (n = 37; HC, n = 20) and inKD (n = 13; HC, n = 8) and was characterized by overexpression of inflammation, platelets, apoptosis and neutrophil genes, and underexpression of T and NK cell genes. Classifier genes discriminated KD from adenovirus with higher sensitivity and specificity (92% and 100%, respectively) than for GAS (75% and 87%, respectively). We identified a genomic score (MDTH) that was higher at baseline in IVIG-NR [median 12,290 vs. 5,572 in responders, p = 0.009] and independently predicted IVIG-NR. Conclusion A reproducible biosignature from KD patients was identified, and was similar in children with cKD and inKD. A genomic score allowed early identification of children at higher risk for non-response to IVIG.


Introduction
Kawasaki disease (KD) is a febrile vasculitis of unknown etiology that affects young children. The estimated annual incidence is 17-21 per 100,000 children under the age of 5 in the United States [1][2][3]. Studies have shown that treatment within the first ten days of illness with intravenous immunoglobulin (IVIG) and aspirin significantly reduces the incidence of coronary artery abnormalities (CAA) [4]. However, the diagnosis of KD is challenging, especially for children whose presentation lack the full clinical spectrum, termed as "incomplete" KD. A delay or lack of diagnosis of both complete and incomplete KD may have long-term consequences [5].
Children with KD that develop persistent or recrudescent fever after IVIG are at higher risk of developing CAA. In Japan, the application of clinical scoring systems to identify children at higher risk for treatment failure and thus developing CAA, has proven useful to intensify primary treatment with IVIG with adjunctive treatments such as corticosteroids [6]. However, in the multi-ethnic, non-Japanese population such as that in the United States, identification of children at high-risk for non-response to IVIG and/or development of CAA has been difficult using clinical criteria alone.
Previous studies have demonstrated the value of gene expression profiling to aid in the assessment of disease severity in children with infectious or autoimmune diseases, and to differentiate KD from other mimicking conditions [7][8][9][10][11][12][13][14][15]. The major goals of this study were to utilize gene expression profiling: 1) to define a KD transcriptional biosignature that can aid in the characterization of complete and incomplete KD in children, 2) to define the specificity of the KD biosignature compared with that from children with other febrile illnesses such as adenovirus and Group A streptococcus infections (GAS), and 3) to assess the value of a genomic score [molecular distance to health (MDTH)] assessed before IVIG treatment to determine the likelihood of non-response to IVIG therapy.

Patient characteristics
From June 2007 to March 2013, we prospectively enrolled and obtained blood samples in a total of 162 children < 18 years of age; 125 children hospitalized with KD or other febrile illnesses and 37 healthy, asymptomatic children as controls who were age-, gender-and racematched. Of the 125 hospitalized children, 89 meet the definition of KD [76 met criteria for complete KD and 13 for incomplete KD based on the American Heart Association (AHA) criteria [5], 19 patients had confirmed adenovirus infection, and 17 children had GAS infection. All children with KD were enrolled before IVIG treatment and sequential samples collected at the right coronary artery or left anterior descending at any time point, and IVIG resistance as persistent or recrudescent fever at least 36 hours after completion of IVIG treatment.

Sample collection
Blood samples (1-3 mL) were collected in Tempus tubes (Applied Biosystems, CA, USA) before IVIG treatment in all KD patients, or at diagnosis in children with adenovirus and GAS infections. All samples were stored at -20˚C until further processing in batches. Whole blood RNA was processed and hybridized into Illumina Human HT12 V4 beadchips (47,323 probes) and scanned on the Illumina Beadstation 500 as described [9,14]. The data is deposited in the NCBI Gene Expression Omnibus (GEO accession number: GSE68004).

Microarray data and statistical analysis
Illumina GenomeStudio software was used to subtract background and scale average samples' signal intensity and GeneSpring GX 7.3 software was used to perform further normalizations and analyses [14,16,17]. Briefly, transcripts were first selected if they were present in ! 10% of all samples and had a minimum of 2-fold expression change compared with the median intensity across all samples [14]. Using this approach a total of 11,096 quality control transcripts was obtained. We then followed the strategy outlined below [13]: a) Supervised analysis (comparative analyses between predefined sample groups) was performed using Mann-Whitney (p< 0.01), followed by Benjamini-Hochberg multiple test corrections and a ! 1.25 fold change filter in expression level relative to the control group; b) Unsupervised clustering (unbiased grouping of samples based on their molecular profile without prior knowledge of sample classification) was applied to the test and validation sets; c) Class prediction using the K-Nearest Neighbors (KNN), with six neighbors and a P value ratio cutoff of 0.5, was applied to identify the top ranked genes that best discriminated between KD, adenovirus and GAS infections in the training and test sets. We, then, calculated the sensitivity and specificity of the top classifier transcripts in those test sets [13,14,16,18]; d) Functional gene analyses were performed using modular analysis as described [9,14,19]. Module transcript content and annotations are at http://www.biir.net/public_wikis/module_annotation/V2_Trial_8_Modules, and, e) Molecular distance to health (MDTH), a tool that converts the global transcriptional perturbation of each sample into an objective score compared with the healthy controls as reference, was calculated and correlated with parameters of disease severity. To calculate the MDTH scores we used the dispersion of expression values found in the baseline samples (controls) to determine whether the expression values of a patient's given sample lay outside two standard deviations (± 2SD) of the mean value for the healthy controls. Thus, each individual score represents the overall transcriptional perturbation ("distance") per patient sample in relation to the healthy control baseline. By summarizing the overall transcriptional activity in one score, the MDTH facilitates correlation with clinical parameters and has previously been applied in other disease states [9,14,20].
Univariate followed by multivariable logistic regression analyses were conducted to identify which factors independently predicted IVIG resistance, the primary outcome. Although the study was not adequately powered for this purpose, as an exploratory objective, we also examined if MDTH could predict the development of coronary artery abnormalities (CAA, either coronary aneurysm or ectasia). We first calculated the area under the receiver operating characteristic (ROC) curve (AUC) using the pre-treatment MDTH values to determine its ability to discriminate responders versus non-responders to IVIG. The optimal threshold for MDTH was defined using Youden's J statistic, which maximizes sensitivity and specificity [21]. For multivariable analyses, covariates were included in the model if they were significant in univariate analysis or clinically relevant. Predictor variables with a p <0.05 and multivariate odds ratios (OR) and 95% confidence intervals that did not include 1 were considered significant. Statistical analyses were performed using, Graph Pad Prism V5 (San Diego, CA) and SAS 9.3 (SAS Institute, Inc., Cary, NC).

Kawasaki Disease transcriptional signature is reproducible in children with both complete and incomplete presentation
To define the transcriptional KD signature, children with complete KD (cKD) were divided randomly in two cohorts: training (discovery) and test (validation) sets. Demographic, clinical and laboratory data are listed in Table 1. Statistical group comparisons identified 8,799 differentially regulated transcripts between 39 KD patients and 16 healthy matched controls in the first group of patients (training set; Fig 2A). There were 5,517 (62.7%) transcripts underexpressed and 3,282 (37.3%) overexpressed. This signature was validated in a second cohort of 37 patients with complete KD and 20 healthy controls using unsupervised hierarchical clustering which grouped 36 of 37 KD patients together ( Fig 2B). The only KD sample that clustered with the healthy control group was obtained in a 7-month-old male who was on day 9 of illness at the time of sample collection. A third, new group of children with incomplete KD (iKD; n = 13) and 8 matched HC was also analyzed. Unsupervised hierarchical clustering of the complete KD biosignature (8,799 transcripts) applied to this cohort correctly grouped together all patients with incomplete KD (Fig 2C).
To characterize the biological significance of the KD biosignature, we then applied an analytical framework of 62 transcriptional modules [19,22]. Module maps were derived independently for each of the complete KD cohorts (training set [ Fig 3A], test set [ Fig 3B]) as well as for children with incomplete KD (inKD) (Fig 3C). Children with complete KD demonstrated significant overexpression of modules related to inflammation, platelets, apoptosis, and neutrophils. Conversely, genes associated with B cells, T cells and cytotoxic/NK cells were significantly underexpressed. These findings were confirmed in the test (cKD) and validation (inKD) sets, as demonstrated by significant correlations of modular gene expression (r = 0.984 and 0.81, respectively Fig 3E and 3F).

Transcriptional profiles in Kawasaki disease versus adenovirus and Group A streptococcus infections
To determine whether the systemic host immune response in children with KD was different compared with those induced by adenovirus and GAS infections (two common mimicking illnesses) we applied a K-nearest neighbor (K-NN) class prediction algorithm (Fig 4). Confirmed adenovirus infection was defined as a positive nasopharyngeal culture or PCR and considered clinically consistent with adenovirus disease. Children with GAS infection, included those with a positive culture from a sterile site (n = 14), or scarlet fever (n = 3).
The KNN algorithm identified 25 classifier genes that best discriminated KD from adenovirus infection in two independent groups of patients (Fig 4). The 25 classifier genes correctly    (Table 2A). We also applied the KNN algorithm to discriminate KD from GAS and identified 10 classifier genes that best discriminated KD from GAS infection also in two independent cohorts of patients. The 10-classifier genes correctly classified 16 of 20 samples in the training set, and 13 of 16 in the test set. The 10 classifier genes discriminated KD from GAS infection with 75% (95% CI [34%-96%]) sensitivity and 87% [47%-99%] specificity in the test set (Table 2B). S1 Table includes a description of the respective 25 and 10 classifier genes.

Molecular distance to health scores and clinical outcomes
We next investigated whether a genomic score (molecular distance to health [MDTH] score) could help predicting: a) the response to initial treatment with IVIG, and b) the development of coronary artery abnormalities in children with complete and incomplete KD. Of 77 KD patients in whom complete clinical data were available, 10 (13%) did not respond to therapy.  By the 5-8 week visit, 12 (15.5%) subjects had abnormal dimensions for the coronary arteries on at least one echocardiogram. MDTH scores from samples obtained at enrollment and before IVIG therapy were significantly higher in children who remained febrile 36 hours after initial treatment with IVIG and required additional therapy compared with those who responded (Fig 6). There was no correlation of MDTH with the development of CAA. The MDTH scores at enrollment correlated weakly with C-reactive protein (CRP) serum concentrations (r = 0.29, p = 0.008) and inversely correlated with days of fever (r = -0.22, p = 0.03).

Longitudinal assessment of molecular distance to health scores
Sequential samples were available from 18 patients before and 24 hours after completion of IVIG treatment. Of these 18 patients, three did not respond to initial IVIG treatment and one developed coronary artery dilatation. In addition, 13 out of these 18 patients had samples available at 5-8 weeks post treatment (convalescent). Longitudinal analysis showed a significant decrease in MDTH scores in 14 of 18 patients 24 hours after completion of IVIG (Fig 7). Of the four patients in whom MDTH increased at 24h post-IVIG, two had slight increased values and responded to IVIG treatment, another developed coronary artery dilation, and the fourth child had a significant increase in MDTH values and did not respond to the first dose of IVIG. By 5-8 weeks after IVIG, MDTH values significantly decreased in the majority of patients (85% or 11 of 13 patients). There were two exceptions, one patient was an IVIG-non responder and required a corticosteroid medication taper. He had residual arthralgias on the day of sample collection. The other patient was doing clinically well on the follow-up evaluation, but had a documented visit to the urgent care one week earlier and was diagnosed with an upper respiratory tract infection.

Molecular distance to health scores independently predicts response to intravenous immunoglobulin therapy at enrollment
To determine the ability of MDTH to discriminate between response and non-response to IVIG, we constructed the receiver operating characteristic (ROC) curve (AUC) using the pre-  . Low values of MDTH (< 10,428) using this threshold had an 88% sensitivity and 60% specificity for classifying KD patients responders to treatment with IVIG (Fig 8).
Last, we performed multivariable logistic regression analysis to identify which factor(s) alone or in combination best predicted response to IVIG therapy. After adjusting for gender, days of fever, type of KD (incomplete vs. complete), albumin, and CRP concentrations, only high MDTH values (OR = 1.72 [1.18-2.48]) and younger age (OR = 0.94 [0.89-0.99]), independently predicted lack of response to IVIG therapy (Table 3). MDTH scores were analyzed in 18 patients with available sequential samples: acute (pre-treatment), 24 hours after IVIG (24 post IVIG), and convalescent (5-8 weeks after treatment). The three patients that failed to respond to the first dose of IVIG (IVIG-NR) are shown in red color, and the patient with coronary artery dilation, is shown in blue color. One subject, marked with an asterisk ( Ã ), was a non-responder to IVIG, required a corticosteroid taper and had residual arthralgias on the follow-up visit during convalescence (in red). https://doi.org/10.1371/journal.pone.0197858.g007

Discussion
Using RNA transcriptional profiling, we identified a distinct biosignature in children with complete KD that was validated in two independent patient cohorts, one comprised of children with complete KD and a second of children with incomplete KD. We also identified classifier genes that discriminated KD patients from those with adenovirus infection quite precisely and less so from patients with GAS, two common conditions that mimic KD. Lastly, we defined a genomic MDTH score at presentation that was associated with resistance to IVIG   Fig 8. Area under the ROC curve (AUC) of MDTH scores to predict response to IVIG therapy. The optimal threshold for MDTH defined using Youden's J statistic, which maximizes sensitivity and specificity is 10,428. When MDTH is dichotomized as "low" or "high" based on this threshold, low MDTH values have 88% sensitivity and 60% specificity for classifying patients as responders, and AUC remains good at 0.741.
https://doi.org/10.1371/journal.pone.0197858.g008 therapy. Overall, this study confirms the potential application of transcriptional profiling for the diagnosis of KD and the utility of genomic scores to predict response to therapy in children with KD before IVIG treatment. Previous studies have also described overexpression of innate immunity transcripts and down regulation of T and B cell receptors and NK cell signaling in patients with KD compared with febrile control patients [23][24][25]. In the present study, we identified a distinct and reproducible KD biosignature that was validated in a second independent cohort of KD patients, and more importantly in a third group of children presenting with incomplete KD, supporting the value of the current clinical criteria for the diagnosis of KD.
Host expression patterns based on RNA transcriptional profiles have been used to distinguish bacterial from viral infection in young febrile infants and children [13,18,26], and also to discriminate patients with KD from those with adenovirus infection [15]. Although appealing, it is challenging to use host expression patterns in a disease that lacks diagnostic confirmation, especially in those with the less common incomplete presentation of the disease. One example of a similar scenario in which diagnostic confirmation is challenging is pediatric tuberculosis. In a recent study, a genomic signature was identified first in children with confirmed TB and was then applied to patients with "probable" or "possible" disease with 82% sensitivity and 84% specificity. This approach emphasizes the utility of genomic signatures when initially applied to patients with clearly defined illness, followed by application to other cohorts with less defined clinical diagnosis [14,27]. We followed a similar approach as we first derived the KD biosignature from children with complete KD, and then applied it to children with less clearly defined illness, incomplete KD. Indeed, it was remarkable to observe that patients with incomplete KD had a profile almost identical to those with cKD. The peripheral blood KD biosignature, for both complete and incomplete KD, was characterized by marked overexpression of inflammation, apoptosis, platelets and neutrophil genes, and with marked underexpression of cytotoxic T cell and NK cell related genes. This is in contrast to the findings of gene expression at the level of coronary tissue, where cytotoxic T cells have been shown to be upregulated [28]. These findings have significant implications at they suggest that complete and incomplete KD share similar immunopathogenesis. In addition, these findings emphasize the importance of early identification and treatment of patients with incomplete KD, as their immune response is similarly dysregulated. It also adds to the validation of the AHA clinical criteria to identify and diagnose patients with incomplete KD [29,30].
Distinguishing patients with adenovirus infection from KD can be difficult as adenovirus infections are common in this age group, often result in elevated inflammatory markers, and because adenovirus can be detected incidentally in children with KD [31,32]. The present study showed that transcriptional profiling can help discriminate adenovirus from KD with high accuracy, confirming previous reports [15]. The modular analysis provided additional detail of the biological differences between the two conditions. KD patients had overexpression of two of the three interferon modules. The overexpression of module 1.2, more reflective of type I interferon, however, was absent in children with KD [33]. In agreement with these findings, previous studies have reported that KD patients had reduced expression of type I interferon transcripts compared with those with adenovirus infection [15]. Using transcriptional profiles to differentiate KD from GAS infection proved to be more difficult, though distinguishing KD from GAS disease in the clinical setting is less problematic as the organism can be isolated from the pharynx or blood in most cases of invasive disease. The majority of patients in this group had GAS identified from a normally sterile body site, so were not the typical patient in whom KD is considered in the differential diagnosis, and only 3 patients had scarlet fever, which would be more typically associated in the differential diagnosis of KD. Earlier studies have described similarities in gene expression profiles between bacterial infections and various autoimmune processes [11], likely reflecting downstream inflammatory cascades that are shared between these conditions. Kawasaki disease is a dynamic process reflecting acute and subacute stages and our patients were enrolled at varying days of fever. Increasing days of fever at the time of the sample collection was inversely correlated with the MDTH score. Further investigation with a substantial number of patients treated later in the course of KD illness will be extremely important, as these children may no longer exhibit the biosignature observed in children diagnosed earlier.
Thus, we believe that duration of fever is likely a crucial factor to consider when interpreting host expression data, and we accounted for such factor when we conducted our multivariable models.
Japanese clinical scoring systems have been successful in identifying high-risk patients to facilitate use of adjuvant therapies such as corticosteroids for primary treatment of KD, which have been associated with decrease coronary artery aneurism development in these children [6]. Those scoring systems have not been validated in ethnically diverse non-Japanese populations [34,35]. Identification of potential non-responders before IVIG treatment using genomic scores (MDTH) in an ethnically diverse population may potentially allow early use of adjunctive therapies in selected patients, which in turn could be associated with improved coronary outcomes [6,30]. Previous studies have utilized whole blood transcriptional patterns to identify specific transcripts that were over-expressed in patients with eventual IVIG resistance [36,37], as well as serum biomarkers and cytokines such as G-CSF and sTNFR1 that were correlated with lack of response to therapy. [38,39][40]. We utilized a slightly different approach with the MDTH score, and instead of focusing on a specific marker, we evaluated the overall molecular perturbation in KD patients in relation to the healthy control baseline. The MDTH scores from KD patients before IVIG treatment were significantly higher in children who were eventually IVIG resistant. Furthermore, among the laboratory variables evaluated, only high MDTH scores independently and incrementally predicted lack of response to therapy.
Our study has limitations. Because the limited number of patients with coronary abnormalities in our cohort, we were unable to identify any correlation between pre-treatment MDTH scores and development of coronary abnormalities (CAA). Profiles were derived from peripheral whole blood and may not reflect the pathophysiology of arterial damage, the most relevant site of inflammation in KD. On the other hand, KD is a systemic process and this type of analytical strategy has been used successfully to identify biomarkers and characterize the immunopathogenesis of other childhood vasculitides [11,33].
In summary, this study provides further evidence of the marked dysregulation of the systemic inflammatory response induced by KD disease and confirms the value of transcriptional profiling as a promising strategy to determine eventual lack of response to IVIG, after adjusting for other variables in this cohort. As new and potentially more rapid technologies become available, transcriptional profiling in the clinical setting has the potential for use to improve the risk stratification of KD patients.
Supporting information S1 Table. 10 classifier genes that best discriminated between Kawasaki disease and Group A streptococcus infection and the 25 classifier genes that best discriminated between Kawasaki disease and adenovirus infection. The systematic, common and genebank names are included in the first three columns. Function of those genes and analytical groups (KD vs GAS or KD vs. HAdV) are included in the 4 th and 5 th columns.