Figure 1.
Discovery of Crohn’s disease (CD) and ulcerative colitis (UC) inflammatory subtypes by gene expression profiling.
(A) Unsupervised hierarchical clustering of UC (blue), CD (red) and healthy controls (green) cases using the 54,675 probes contained in the chip. Up-regulated genes are shown in red and down-regulated genes in green. (B) Supervised hierarchical clustering of CD cases or UC cases using differentially expressed genes between CD and healthy (261 probes) and UC and healthy (1255 probes) respectively. This process defined two subgroups for both CD and UC cases. Functional analyses were performed using PANTHER Classification System. Examples of genes of each category are shown. (C) Clustering of samples using principal component analysis: CD1 (pink), CD2 (red), UC1 (light blue), UC2 (dark blue), healthy controls (green). (D) Association between clinical characteristics and CD and UC subtypes. Data represents the proportion of patients of each disease subtype included in different clinical variables, *p<0.05, Fisher exact test, CD1 vs. CD2 or UC1 vs. UC2. Complete clinical data is provided in Table S1. (E) Number of differentially expressed genes (p<0.05, t-test) in different comparisons among groups and with a fold change >1 or <−1.
Figure 2.
Identification of predictive classifiers of “High” and “Low” inflammation subtypes of Crohn’s disease (CD) and ulcerative colitis (UC).
(A) A class predictor was built for both CD and UC using the tool Prophet and the 54,675 probes contained in the chip, using the leave-one-out cross-validation strategy. For each predictor, patients that were correctly classified are shown green, while those patients that the predictor failed to classify correctly are shown in red. (B) Based on the accuracy of each classifier, a 10-gene predictor was selected in both cases, which were able to accurately classify 92% of CD patients and 100% of UC patients. Up-regulated genes are shown in red and down-regulated in blue.
Figure 3.
Confirmation of the predictive classifiers of “High” and “low” inflammation subtypes of Crohn’s disease (CD) and ulcerative colitis (UC) by quantitative real time-PCR.
A total of 43 genes (including the predictors) were selected for real time-PCR validation of the microarray data in CD (A) and UC (B). Scatter plots show the correlation between microarray (fluorescence intensity, FI) and PCR (DCt value respect to the endogenous GAPDH, DCt) data, validating microarray results. The tables show the predictions obtained with the selected set of genes obtained from the microarray analysis using the PCR data. For each predictor, patients that were correctly classified are shown in green while those patients where the predictors failed to classify are shown in red. The classification accuracy obtained with PCR data was the same as the one obtained with the microarray data.
Figure 4.
Systematic search for new predictors associated with clinical variables.
Predictors were built using the tool Prophet and the 54,675 probes contained in the microarray, using the leave-one-out cross-validation strategy, to identify classifiers genes for glucocorticoid sensitivity (A), glucocorticoid dependency (B) or need for surgery (C). Data are represented as box and whiskers plots, where the error bars designate the smallest and largest observations and dots designate the outliers. Data was analyzed by two-way ANOVA followed by Bonferroni post test, p values higher than 0.05 were considered not significant (ns), *p<0.01, **p<0.001.
Figure 5.
Systematic search for new predictors associated with IBD.
Predictors were built using the tool Prophet and the 54,675 probes contained in the chip, using the leave-one-out cross-validation strategy, to identify classifiers genes for inflammatory bowel disease, IBD (A), Crohn’s disease, CD (B), ulcerative colitis, UC (C), CD vs. UC (D), low inflammation subtypes of CD and UC (E), and high inflammation subtypes of CD vs. UC (F). Data are represented as box and whiskers plots, where the error bars designate the smallest and largest observations and dots designate the outliers. Data was analyzed by two-way ANOVA followed by Bonferroni post test, p values higher than 0.05 were considered not significant (ns), *p<0.01, **p<0.001.