Table 1.
Composition of study cohorts.
Figure 1.
Schematic of the analysis pipeline for integrative analysis of multiple SSc skin datasets.
(A) Each microarray dataset (Milano et al., Pendergrass et al., and Hinchcliff et al.) was independently clustered by WGCNA into gene coexpression modules (colored circles). Each module is a set of genes that was highly correlated within a dataset. (B) Modules were compared across datasets using a novel procedure (MICC) to determine which were approximately conserved across all three datasets. The network in (B) is called the information graph and encodes the nontrivial overlaps of modules across datasets. Triangles in this network correspond to approximately conserved modules across all three datasets. Communities in this network (dotted ovals) represent collections of modules that are conserved together and thus have similar biological function. Note that communities in the network can overlap (e.g. module P1 in the schematic belongs to two communities). (C) Genes derived from the module communities are called consensus genes and were used for downstream bioinformatics analyses including gene ontology enrichment analysis using the g:Profiler tool, testing for intrinsic subset-specificity, and functional interaction network analysis using the IMP functional network. Each of these downstream analyses is independent and complementary.
Figure 2.
Gene expression modules associated with the intrinsic subsets of SSc.
We identified 54 major sets of genes (modules) using WGCNA that define the spectrum of gene expression in SSc skin using Milano as a test case. The top 6 most significant modules are shown and each shows a statistically significant association with the intrinsic subsets (including the limited subset). Module assignment for each gene is unique. The genes that compose the subset-specific modules represent more than 40% of the protein-coding genes in the human genome. Therefore, the intrinsic subsets seem to be determined by a large fraction of the encoded genes. The module eigengene of each module is shown in a stem-plot below each heatmap with intrinsic subsets indicated by color above the heatmap. Proliferative, red; inflammatory, purple; limited, yellow; normal-like, green. (A) Inflammatory modules (p<10−9 and p<10−7; Kruskal-Wallis non-parametric ANOVA corrected for multiple testing), (B) Limited Module (p<0.006), (C) Fibroproliferative modules (p<10−7; p<10−8), (D) Fibroproliferative and Limited expression module (p<10−9). Enriched molecular processes are indicated for each subset to the right of each heat map.
Figure 3.
Information graph and consensus clusters for the MPH cohorts.
(A) The information graph of the MPH cohorts is highly modular (cf. S2 Fig.), indicating approximate conservation of gene expression modules across datasets. The information graph is tripartite by construction, so a triangle in the graph necessarily connects modules across all three datasets. The triangles form communities of mutual edge sharing. Colored nodes and edges highlight four of these communities. The purple community contains modules that are up-regulated in the inflammatory subset (cf. panel B). The red community contains modules that are up-regulated in the fibroproliferative subset (cf. panel B). The cyan community contains modules that are enriched for keratinocyte-specific processes. The orange community contains modules that are enriched for fatty acid metabolism genes. The remaining communities (22 in all and not colored to avoid cluttering the display) are enriched primarily for housekeeping processes and are neither skin- nor disease-specific (see Table 3). (B) Modules from the communities were tested for their enrichment in the subsets. Each row corresponds to a triangle in the information graph and each column corresponds to a dataset. The black lines separate communities, e.g. all of the rows in the block marked “1” correspond triangles in community 1. The cells are colored according to whether the module was significantly differentially expressed in a subset with dark colors representing up-regulation and light colors representing down-regulation (Bonferroni-corrected Wilcoxon rank sum p-value p<0.05). We assessed statistical significance of modules within each dataset for each of the three diffuse SSc intrinsic subsets, as well as all SSc vs. healthy controls (Purple- Inflammatory, Red- Proliferation, Green- Normal-like, Blue- All SSc). Note the inflammatory up community (*) and the fibroproliferative up community (**). Note also that community 2 is significantly highly expressed in the inflammatory subset and lowly expressed in the proliferative subset in Milano only. Likewise, community 9 appears to be expressed at low levels in the inflammatory subset in Milano, but none of the other data sets.
Table 2.
Statistics of module conservation.
Table 3.
Molecular processes enriched in the 13 largest consensus clusters.
Figure 4.
Molecular network of inflammatory and fibroproliferative consensus genes.
The consensus genes for the inflammatory and fibroproliferative subsets are connected in the IMP functional network. Inflammatory genes are colored purple, while fibroproliferative genes are colored red. Genes with polymorphisms are colored in green and MRSS biomarker genes are colored yellow. One MRSS biomarker gene (IFI44) was also an inflammatory consensus gene (pink), while three polymorphic genes were inflammatory consensus genes (turquoise). Note the five distinct subnetworks corresponding to type I interferons, M2 macrophages, ECM proteins and TGFβ signaling, adaptive immunity, and cell proliferation. The interferon, M2 macrophage, and adaptive immunity subnetworks are composed almost exclusively of inflammatory genes, while the ECM subnetwork shares genes from both intrinsic subsets. Furthermore, the polymorphic genes interact primarily with inflammatory subset genes indicating that the genetic risk in SSc is related to immune abnormalities.
Figure 5.
Hubs in the inflammatory and ECM components of the network.
The putative MRSS biomarker gene IFI44 is a hub of the type 1 interferon subnetwork. AIF1, which contains SSc-associated polymorphisms and is related to M2 macrophage polarization, is a hub of the M2 macrophage network. FBN1, which contains SSc-associated polymorphisms in some populations and is a key component of ECM that regulates matrix stiffness, is a hub of the TGFβ/ECM network. The tyrosine kinase gene LYN is associated with B cell activation and mediating self-tolerance and is a hub in the adaptive immunity subnetwork.
Figure 6.
Bridges between components of the network.
Several genes bridge the component subnetworks of the molecular network. PLAUR is a gene that contains SSc-associated polymorphisms that forms a bridge between the interferon subnetwork and TGFβ/ECM subnetwork. The gene RAC2 is a bridge between the interferon and M2 macrophage subnetworks. The genes LCP2 and CXCR4 are bridges between the M2 macrophage subnetwork and the adaptive immunity subnetwork. There are also several paths through GRB10 to ADAP2 between the M2 macrophage subnetwork and the adaptive immunity subnetwork. The genes CD14 and THY1 (CD90) are bridges between the M2 macrophage subnetwork and the TGFβ/ECM subnetwork. The genes IRAK1 and PXK are bridges between the TGFβ/ECM subnetwork and the cell proliferation subnetwork.
Figure 7.
Model of interactions among the components of the network.
The molecular network of Fig. 4 is densely interconnected, implicating many possible interactions between the core molecular processes (interferon activation, M2 macrophage activation, adaptive immunity, ECM remodeling, and cell proliferation). Stepping back from the granular detail of single genes, we see a system of distinct parts through which SSc could be initiated and maintained. Among these are paths of particular interest. The interferon subnetwork and the M2 macrophage subnetwork are connected by RAC2. The M2 macrophage subnetwork in turn is connected to the ECM subnetwork through paths through CD14 and THY1. Suggesting macrophages may influence or drive ECM abnormalities in skin. The interferon subnetwork and the ECM subnetwork are connected through paths containing the pleiotropic and polymorphic gene PLAUR. The M2 macrophage subnetwork is connected to the adaptive immunity subnetwork through several distinct sets of paths through the genes GRB10, LCP2, and CXCR4. The ECM subnetwork is connected to the cell proliferation cluster through TGFβ pathway genes and paths containing the polymorphic genes IRAK1 and PXK, which suggests that ECM remodeling modulates cell proliferation through the TGFβ pathway. The interferon node may negatively regulate proliferation via the ERK/MAPK pathway resulting in the general mutual exclusivity of the inflammatory and fibroproliferative subsets. Thus we see a set of interconnected, balancing feedback loops that can enforce subset homeostasis, but also allow for patients to transition between the subsets, possibly in response to therapy.