Peripheral Blood Gene Expression as a Novel Genomic Biomarker in Complicated Sarcoidosis

Sarcoidosis, a systemic granulomatous syndrome invariably affecting the lung, typically spontaneously remits but in ∼20% of cases progresses with severe lung dysfunction or cardiac and neurologic involvement (complicated sarcoidosis). Unfortunately, current biomarkers fail to distinguish patients with remitting (uncomplicated) sarcoidosis from other fibrotic lung disorders, and fail to identify individuals at risk for complicated sarcoidosis. We utilized genome-wide peripheral blood gene expression analysis to identify a 20-gene sarcoidosis biomarker signature distinguishing sarcoidosis (n = 39) from healthy controls (n = 35, 86% classification accuracy) and which served as a molecular signature for complicated sarcoidosis (n = 17). As aberrancies in T cell receptor (TCR) signaling, JAK-STAT (JS) signaling, and cytokine-cytokine receptor (CCR) signaling are implicated in sarcoidosis pathogenesis, a 31-gene signature comprised of T cell signaling pathway genes associated with sarcoidosis (TCR/JS/CCR) was compared to the unbiased 20-gene biomarker signature but proved inferior in prediction accuracy in distinguishing complicated from uncomplicated sarcoidosis. Additional validation strategies included significant association of single nucleotide polymorphisms (SNPs) in signature genes with sarcoidosis susceptibility and severity (unbiased signature genes - CX3CR1, FKBP1A, NOG, RBM12B, SENS3, TSHZ2; T cell/JAK-STAT pathway genes such as AKT3, CBLB, DLG1, IFNG, IL2RA, IL7R, ITK, JUN, MALT1, NFATC2, PLCG1, SPRED1). In summary, this validated peripheral blood molecular gene signature appears to be a valuable biomarker in identifying cases with sarcoidoisis and predicting risk for complicated sarcoidosis.


Introduction
Individuals with sarcoidosis, a systemic inflammatory and noncaseating granulomatous disease of unknown origin affecting multiple organs and invariably the lung [1,2], typically undergo spontaneous resolution.However, ,20% of affected individuals experience progressive disease with respiratory, cardiac or nervous system involvement.Complicated sarcoidosis is defined as exhibiting either cardiac manifestations (e.g., ventricular arrhythmias) [3], neurologic involvement (e.g., with evidence of hyperdense MRI lesions) [4] or deteriorating lung function (e.g., FVC ,50%).Currently, FDA-approved therapies for complicated sarcoidosis do not exist and corticosteroids and corticosteroidsparing immunosuppressive agents (TNFa inhibitors) have met with only limited success [5].The accurate identification of individuals with or at risk for complicated sarcoidosis is a vexing clinical challenge with attempts to define clinically-useful biomarkers largely unsuccessful.Sarcoidosis biomarkers are desperately needed to deliver targeted therapies in individuals with complicated sarcoidosis and to identify patients at risk for increased morbidity and significant mortality as a consequence of complicated sarcoidosis.
Our study was designed to identify novel genomic biomarkers by comparing genome-wide gene expression data in African American (AA) and European descent ancestry (EA) sarcoidosis cases.We identified a universal gene signature that differentiates sarcoidosis patients from healthy controls and distinguishes complicated sarcoidosis (pulmonary-FVC,50%, cardiac, or neurologic sarcoidosis) from uncomplicated sarcoidosis.This gene signature was superior in prediction accuracy in each of the AA and EA populations when compared to a second signature comprised of genes within the T cell receptor-innate immunity pathway that includes genes previously associated with sarcoidosis.These signatures distinguished sarcoidosis patients from idiopathic pulmonary fibrosis (IPF) cases with signature validation provided by significant association of genetic variants within signature genes with sarcoidosis susceptibility.These results highlight the utility of peripheral blood molecular gene signatures as valuable biomarkers for predicting individuals at risk for complicated sarcoidosis and for potentially facilitating individualized therapies in this enigmatic disorder.

Patient Characteristics
PBMC samples were collected from subjects with sarcoidosis (n = 39) and healthy controls (n = 35) (Table 1).The clinical characteristics of study patients are displayed in Table 2. Significant differences in age, gender, race and pulmonary function studies did not exist between uncomplicated and complicated sarcoidosis cases (P.0.05 by x2 test for gender and p.0.05 by t-test for the other characteristics).Uncomplicated sarcoidosis cases trended toward higher corticosteroid usage whereas complicated sarcoidosis cases trended toward higher methotrexate usage and were more likely to be receiving anti-TNFa therapy.However, these differences were not statistically significant (P.0.05 for all drugs) (Table 2).Predictably, complicated pulmonary sarcoidosis cases exhibited significantly reduced pulmonary function compared to the other study groups (data not shown).

Identification of Differentially-expressed Genes in Sarcoidosis
All cases with diagnoses of cardiac, neurologic, or severe pulmonary sarcoidosis (FVC,50%) comprised the cohort labeled as 'complicated sarcoidosis'.At the specified significance level (fold-change .1.4,q-value ,0.05), 316 genes were differentially expressed between all sarcoidosis cases and healthy controls in the combined samples (pooled AAs and EAs).For individual populations, 118 genes were differentially-expressed between all AA cases and controls, whereas 861 genes were differentially expressed between all EA cases and controls.In contrast, 1124 genes were differentially expressed between complicated sarcoidosis cases and healthy controls in the combined samples.For individual population, 730 and 980 genes were differentially expressed between AA and EA cases with complicated sarcoidosis and healthy controls, respectively with the TCR signaling pathway significantly enriched among complicated sarcoidosis-associated genes in both populations (adjusted P,0.05) (Figure 1A).

Identifying a Gene Signature for Complicated Sarcoidosis
To identify a universal gene signature for complicated sarcoidosis in both AA and EA populations, an initial analysis set comprised of 1233 genes differentially expressed between AA or EA complicated sarcoidosis cases vs. healthy controls was utilized for the SVM algorithm.Figure S1 depicts the distribution of the prediction accuracy for gene signatures with the number of genes during recursive feature selection (see Supplementary Text S1 for details).A 20-gene signature (Table 3) was chosen as the most parsimonious signature with the peak prediction accuracy (Figure S1) and accurately distinguished patients with complicated sarcoidosis from healthy controls (Figures 1B and 1C), or from uncomplicated sarcoidosis (Figure 1C).Two genes within the unbiased 20-gene signature, HBEGF (heparin-binding EGF-like growth factor) and SAP30 (Sin3A-associated protein, 30kDa), were strongly up-regulated in complicated sarcoidosis whereas the remaining 18 signature genes were down-regulated in complicated sarcoidosis (Figure S1).The non-targeted 20-gene signature distinguished all sarcoidosis patients from healthy controls with an accuracy of 86.0% (sensitivity = 88.2% and specificity = 83.3%) in the combined samples (pooled AAs and EAs) (Figure 1D).The discriminative accuracy became 88.2% and 94.2% in separating sarcoidosis cases from healthy controls in AA and EA, respectively (Figure S2).When distinguishing complicated sarcoidosis cases from uncomplicated sarcoidosis cases, the accuracy was 81.4% (sensitivity = 87.0%and specificity = 74.2%) in the combined samples (Figure 1D) but was reduced to 83.7% and 64.5% in separating complicated sarcoidosis cases from uncomplicated sarcoidosis cases in AA and EA, respectively (Figure S2).

Evaluation of a Sarcoidosis-related TCR/JS/CCR Signaling Pathway Gene Signature
As the T cell receptor pathway (TCR), the JAK STAT signaling pathway (JS) and the cytokine-cytokine receptor signaling pathway (CCR) have all been implicated in sarcoidosis [6,26]

Validation on Independent Datasets
We evaluated the performance of our gene signatures in two different independent sarcoidosis blood gene expression datasets.One dataset (GEO -GSE19314) from University of California, San Francisco (UCSF) [27] and another one (GEO -GSE18781) is from Oregon Health Sciences University (Oregon) [28].The discriminative power is very similar between the unbiased 20-gene and the TCR/JS/CCR signatures in the both datasets.The 20gene signature classified sarcoidosis cases from healthy controls with accuracy of 75.9% and 78.3% for the USCF and Oregon datasets, respectively, while the discriminative accuracy became 75.4% and 80.0% when the TCR/JS/CCR signature was applied for the USCF and Oregon datasets, respectively (Figure 2).Again, principal component analysis indicates that patients with sarcoidosis can be well distinguished from healthy controls in the two independent datasets, just based on the expression of our unbiased 20-gene signature (Figure 2).

Use of Genetic Variants to Validate Sarcoidosis Gene Signatures
A genome-wide association study (GWAS) (Affymetrix 6.0 SNP array) involving 407 sarcoidosis cases including 212 AAs (including 68 complicated cases) and 195 EAs (including 46 complicated cases) was performed and allele frequencies of ,1,300 common SNPs residing in unbiased sarcoidosis signature genes analyzed in sarcoidosis cases and healthy controls (see Supplementary Text S1 for details).At the nominal P-value ,0.01, 30 SNPs from 6 unbiased 20-gene signature genes were found to be significantly associated with sarcoidosis (Table 5), including 4 genes which overlapped between the AA and EA samples (NOG [noggin], RMB12B [RNA binding motif protein 12B], SESN3 [sestrin 3], TSHZ2 [teashirt zinc finger homeobox 2]).The most highly significant signature gene SNP in AAs was rs629508 (P = 1.7610 23 ) in SESN3, whereas in EA cases, the most significant SNP was rs2618134 (P = 4.7610 25 ) in RBM12B.Interestingly, several SNPs were also significantly associated with complicated sarcoidosis, including rs629508 (P = 5.4610 25 ) and rs1294689 (P = 3.6610 25 ) in the AA samples and rs10485815 (P = 2.8610 25 ) in the EA samples (Table 5).In comparison, from ,3,800 common SNPs residing in TCR/JS/CCR signature genes, 37 SNPs were associated with sarcoidosis in AA samples, whereas 34 SNPs were significant in EA samples, respectively (Table S1).The most highly significant TCR-JS-CCR signature gene SNP in AAs was rs2131817 (P = 1.4610 25 ) in AKT3, whereas in EA cases, the most significant SNP was rs7614488 (P = 7.8610 27 ) in CBLB.Several TCR/JS/CCR signature gene SNPs, rs2953040 and rs6791765 in CBLB (Cas-Br-M, murine, ecotropic retroviral transforming sequence b) and rs2131817 in AKT3 were significantly associated with sarcoidosis in both EA and AA sarcoidosis cases (P,0.01)(Table S1).

Discussion
The major aim of this work was to identify potential universal and racially-specific gene signatures to serve as novel biomarkers for the presence of sarcoidosis as well as for the presence and/or susceptibility of the development of complicated sarcoidosis.Leveraging whole genome expression profiles in a cohort of sarcoidosis patients, an unbiased gene signature comprised of 20 autosomal genes was identified which distinguished sarcoidosis cases from healthy individuals and, importantly, differentiated patients with complicated sarcoidosis from patients with uncomplicated sarcoidosis.The 20-gene signature exhibited equivalent prediction accuracy to other sarcoidosis signatures containing a greater number of genes (such as 39-gene and 78-gene sarcoidosis signatures) with each signature superior in accuracy to signatures with fewer genes (e.g., the 10 gene signature) (Figure S1).The expression levels of the majority of these 20 signature genes showed a pattern of an additive model between uncomplicated and complicated sarcoidosis (Figure 3), i.e., when the signature gene is up-regulated, patients with complicated sarcoidosis exhibited higher expression levels than patients with uncomplicated sarcoidosis.In the sarcoidosis signature, 19 of 20 genes performed unidirectionally (up-regulation or down-regulation) in both complicated and uncomplicated sarcoidosis.Therefore, the 20-gene signature appears to not only capture differences between complicated sarcoidosis and healthy controls, but potentially conveys information regarding differences between sarcoidosis cases (both complicated and uncomplicated) and healthy controls.
Gene products encoded by TCR/JS/CCR signaling pathway genes have been implicated in sarcoidosis pathogenesis [6,26] and these signature genes were enriched among the differential genes between EA and AA cases with complicated sarcoidosis cases and healthy controls.The utility of a TCR/JS/CCR signaling pathway gene signature in classifying sarcoidosis cases was compared to the unbiased 20-gene signature.Both signatures performed with high level prediction accuracy (.80%) in distinguishing cases with sarcoidosis from healthy controls.In contrast, the prediction accuracy of the 20-gene signature was much superior to the TCR/JS/CCR signaling pathway gene signature in classifying combined AA and EA patients with complicated and uncomplicated sarcoidosis (81.4% vs. 58.8%,P,10 215 , t-test).We speculate that the unbiased nature of the 20gene signature allows better capture of the characteristics of complicated sarcoidosis compared to the more restrictive TCR/ or the TCR/JS/CCR signaling pathway gene signature.Left panel: all sarcoidosis patients versus healthy controls; and right panel: patients with complicated sarcoidosis versus patients with uncomplicated sarcoidosis.doi:10.1371/journal.pone.0044818.g001JS/CCR signaling pathway signature genes.The potential role of TCR/JS/CCR signaling pathways genes in the development of sarcoidosis was confirmed by the capacity of this signature to successfully differentiate the majority of sarcoidosis and healthy controls.However, we speculate that either sarcoidosis disease progression or the development of complicated sarcoidosis likely requires the participation of genes and pathways extending beyond the TCR/JS/CCR pathway.These findings underscore the complex pathobiology of this disorder and implicate the necessity of global and unbiased approaches.We further evaluated the classification accuracy of the 20-gene sarcoid signature separately in EA and AA samples and found the 20-gene signature to demonstrate .85%accuracy for classifying either EA or AA sarcoidosis cases (complicated and uncomplicated) from healthy controls.In contrast, the 20-gene sarcoidosis signature differentiated complicated sarcoidosis and uncomplicated sarcoidosis cases with an accuracy .80% in AA cases, but only ,60% in EA cases, potentially the relative smaller complicated EA sample size or a bias for AA expression dysregulation driven by greater genetic variation, an issue which requires further examination.Both the 20-gene signature and TCR/JS/CCRgene signature successfully discriminated sarcoidosis cases from IPF patients with similar prediction accuracies reflecting the differences in immunopathogenesis, clinical course, prognosis, and response to steroid treatment [29] in these two fibrotic lung disorders.This finding may infer additional clinical utility of the signature as a diagnostic biomarker for sarcoidosis.
As evidenced by the paucity of PubMed citations (PubMatrix results), the 20-gene signature is comprised of highly novel candidate genes in sarcoidosis susceptibility and severity of disease.As a complementary method to validate our findings [30][31][32][33][34][35], we examined the allele frequencies of both unbiased 20-gene sarcoidosis signature single nucleotide polymorphisms (SNPs) as well as TCR/JS/CCR signaling pathway signature gene SNPs in sarcoidosis cases and healthy controls embedded within a GWAS dataset constructed by genome-wide assessment of genetic variants in over 400 EA and AAs with sarcoidosis.As genetic variants, such as SNPs and copy number variants (CNVs), contribute significantly to variations in gene expression, SNPs were annotated to the genomic regions of these signature genes (based on the Affymetrix annotation) and, therefore, potentially contribute to gene expression variation by acting as cis-eQTLs.From ,1,300 SNPs in our 20 signature genes, we identified 30 SNPs (corresponding to 6 signature genes) which were significantly associated with sarcoidosis in either EA or AA samples, suggesting a potential role of these cis-acting SNPs in regulating the expression of sarcoidosis signature genes.Similarly, from ,3,800 SNPs in TCR/JS/CCR signature genes, relationships between SNPs and sarcoidosis were observed.While these findings serve to validate the potential importance and relevance of signature genes, a direct association between these SNPs and expression is necessary to validate these relationships.Our results suggest that genetic variants via cis-acting eQTLs may contribute to the variation in expression of sarcoidosis signature genes.We further Table 5. SNPs significantly associated with sarcoidosis within the unbiased 20 signature genes (P,0.01).recognize that additional factors, such as trans-acting eQTLs, environmental factors, or epigenetic pathways, may contribute substantially to signature gene expression variation.Further investigations involving genome-wide genotypic data (e.g., for mapping trans-acting eQTLs) and expression data on the same samples could potentially provide greater insights into the contribution of genetics to the identified gene signature.Quantitative abnormalities in T cells have been described in the peripheral blood of patients with sarcoidosis [36] with significant lymphopenia, involving CD4, CD8, and CD19 positive cells, common in sarcoidosis patients and correlating with disease severity [37].Individual signatures genes may not only have a role in the pathophysiology of sarcoidosis but could be potentially approached as novel therapeutic targets for the disease.For example, HBEGF, a member of the EGF family of growth factors, is a potent mitogen and chemoattractant for many cell types including fibroblasts, smooth muscle cells and epithelial cells [38][39][40][41].A substantial body of evidence suggests that HBGEF plays a role in wound healing and response to injury [42][43][44][45] leading to speculation that HBEGF may represent a target involved in the pathobiology of chronic lung sarcoidosis and a novel therapeutic target, an observation supported by the PubMatrix search results.

Population
Among our 20-gene signature, LOC100132356 was most cited in PubMed literatures, though it only codes a hypothetical protein.This gene was linked to the terms such as sarcoidosis, tuberculosis, granulomatous disease, hypersensitivity pneumonitis, and pulmonary fibrosis.However, the detailed function of this gene is still unclear.
Recently, lung gene expression profiles were compared between patients with self-limiting sarcoidosis and those with progressive restrictive fibrotic disease [46] with a greater number of down-regulated genes versus up-regulated genes identified in patients with progressive pulmonary sarcoidosis.These findings are highly consistent with the expression profile of our signature genes in patients with complicated sarcoidosis.Interestingly, we failed to identify any overlap between sarcoidosis signature genes and the differentially expressed genes produced by comparison of selflimited and progressive lung sarcoidosis.The lack of overlap may reflect greater severity of disease in our cohort with cardiac and neurologic sarcoidosis in addition to cases with severe lung disease.In addition, our studies did not involve lung tissue expression but rather analysis of PBMCs and therefore tissue-specific expression may also contribute to this lack of overlap.
Furthermore, our sarcoidosis gene signatures performed well in two independent validation cohorts (UCSF and Oregon) [27,28].We should point out two challenges in our validation.Firstly, our microarray platform (Affymetrix Human Exon 1.0 ST Array) was different from that used for the validation cohorts (Affymetrix Human Genome U133 Plus 2.0 Array).Secondly, our study focused on gene expression in PBMCs while whole blood expression profiles were analyzed for the USCF cohort [27].
In summary, despite significant limitations including a relatively small size of the EA complicated cases in the analysis set, an unbiased 20-gene molecular gene signature was identified as a potential novel molecular biomarker in the diagnosis of sarcoidosis as well for the presence of complicated sarcoidosis with substantial accuracy in both EA and AA sarcoidosis cases.With validation in a replicate sarcoidosis cohort and testing against other granulomatous disorders like Wegener's disease, hypersensitivity pneumonitis, and tuberculosis, this sarcoidosis gene signature may represent a novel universal gene signature for complicated Each number in the table represents the count of literatures containing the corresponding gene name and search term.doi:10.1371/journal.pone.0044818.t006 sarcoidosis and serve as a springboard to individualized therapies in this enigmatic disorder.

Subjects and PBMC Samples
The study was approved by the Institutional Review Board (IRB) of the University of Illinois at Chicago (UIC) with written informed consent obtained from all subjects.The UIC's IRB committee members (Chairs) include: Indru Punwani, D.D.S., Susan Labott, Ph.D., Paul Heckerling, M.D., and Kathryn Rugen, Ph.D. The DNA samples provided by the Johns Hopkins University investigators, and their use in this study, were approved by the IRB of the Johns Hopkins University.PBMC samples were collected from subjects with sarcoidosis (n = 39) and healthy controls (n = 35) (Table 1).The diagnosis of sarcoidosis was based on established joint international criteria [47].Subjects with other concurrent systemic inflammatory diseases were excluded.A total of 29 African descent American (AA) and 10 European descent American (EA) patients with sarcoidosis were included in the overall sarcoidosis cohort with 18 AA and 4 EA patients diagnosed with complicated sarcoidosis defined as cardiac sarcoidosis (e.g., ventricular arrhythmias) [3], neurologic sarcoid (e.g., evidence of hyperdense MRI lesions) [4] or severe pulmonary sarcoidosis (FVC,50%).The detailed description of the therapy status of each patient has been listed in Table S3.

RNA Microarray Hybridization
Total RNA was isolated from PBMCs using standard molecular biology protocols (n = 74) without DNA contamination or RNA degradation.Sample processing (e.g., cDNA generation, fragmentation, end labeling, hybridization to Affymetrix Gene-Chip Human Exon 1.0 ST arrays) was performed by the University of Chicago Functional Genomics Facility per manufacturer's instructions.

Identification of Genes Differentially Expressed in Sarcoidosis and Complicated Sarcoidosis
Human Exon 1.0 ST arrays were summarized using the Affymetrix Power Tools v.1.12.0 (http://www.affymetrix.com/)(see Supplementary Text S1 for details).The microarray data has been uploaded into NCBI GEO database (GEO accession number: GSE37912).Genes on chromosomes X and Y were removed to avoid the potential confounding factor of gender.SAM (Significance Analysis of Microarrays) [48], implemented in the samr library of the R Statistical Package [49], was used to compare log 2 -transformed gene expression levels between patients with complicated sarcoidosis and normal controls in the combined (AA and EA), EA, and AA samples, respectively.False discovery rate (FDR) was controlled using the q-value method [50].Transcripts with a fold-change greater than 1.4 and q-value less than 0.05 were deemed differentially expressed.We searched for any enriched Kyoto Encyclopedia of Genes and Genomes (KEGG) [51] physiological pathways among the differential genes relative to the final analysis set using the NIH/DAVID [52,53].An adjusted P-value,0.05after the Benjamini-Horchberg procedure [54] was used as the cutoff.

Identification of Gene Signature for Classifying Sarcoidosis and Complicated Sarcoidosis
To identify gene signatures useful in the diagnosis and classification of sarcoidosis, a machine learning algorithm based on support vector machine (SVM) using a linear kernel, was applied in combination with recursive feature elimination (RFE) for generating a predictive model (see Supplementary Text S1 for details) [55][56][57][58].The e1071 library of the R Statistical Package [49] was used to conduct SVM and RFE.In each round of RFE, the SVM linear classifier was trained by the pooled samples from both AA and EA, including all the healthy controls and sarcoidosis patients.The gene signature that was comprised of the smallest number of genes with significant peak prediction accuracy was used in subsequent analyses.To test the performance of our gene signature, 1,000 times of five-fold cross-validation was conducted using SVM.In addition, the gene signature was also tested for classification accuracy in AA and EA samples, separately.We also used two independent sarcoidosis datasets using different microarray platforms [27,28] to validate our gene signature.Table S1 SNPs significantly associated with sarcoidosis within the 31 TCR/JS/CRR signature genes (P,0.01).

(PDF)
Table S2 PubMatrix search results for the TCR/JS/ CCR signature genes against sarcoidosis-related search terms.

Figure 1 .
Figure 1.Identifying gene signatures in sarcoidosis.Panel A. Enriched pathways among complicated sarcoidosis-associated genes.The top ranking KEGG pathways are listed for each population.The red line indicates the cutoff of significance (adjusted p-value,0.05).The number of genes in each pathway is shown beside the pathway name.Panel B. Heatmap of patients with complicated sarcoidosis and healthy controls.Red represents increased gene expression; Blue represents down-regulation.''++'': patients with complicated sarcoidosis; ''2'': healthy controls.Panel C. Principal component analysis on expression values of the 20-gene signature.X-axis: principal component 1 with eigenvalue; Y-axis: principal component 2 with eigenvalue.Left panel: patients with complicated sarcoidosis and healthy controls; middle panel: patients with complicated sarcoidosis, uncomplicated sarcoidosis and healthy controls; and right panel: patients with complicated sarcoidosis and uncomplicated sarcoidosis.HC: healthy controls; US: patients with uncomplicated sarcoidosis; and CS: patients with complicated sarcoidosis.Panel D. Comparison between the 20-gene signature and the TCR/JS/CCR signaling pathway gene signature.The distribution of prediction accuracy is based on 1,000 times of five-fold cross-validation.The dashed lines indicate the average classification accuracy for the 20-gene signature

Figure 2 .
Figure 2. Validation in independent datasets.The upper panels show the comparison between the 20-gene signature and the TCR/JS/CCR signaling pathway gene signature.The distribution of prediction accuracy is based on 1,000 times of five-fold cross-validation.The dashed lines indicate the average classification accuracy for the 20-gene signature or the TCR/JS/CCR signaling pathway gene signature.The lower panels show the results of principal component analysis on expression values of the 20-gene signature.X-axis: principal component 1 with eigenvalue; Y-axis: principal component 2 with eigenvalue.doi:10.1371/journal.pone.0044818.g002 results between complicated and uncomplicated sarcoidosis were listed only for the SNPs with P,0.01.OR: odds ratio.doi:10.1371/journal.pone.0044818.t005

Figure
Figure S1 Distribution of the classification accuracy in each RFE step.X-axis: the number of genes in each step; Y-axis: the classification accuracy from a five-fold cross-validation (repeated 1,000 times).The red line shows the average accuracy for each RFE step.(PDF) Figure S2 Distribution of classification accuracies of the 20-gene signature.X-axis: the classification accuracy from a five-fold cross-validation (repeated 1,000 times).The dashed lines indicate the average classification accuracy.(A) All sarcoidosis patients versus healthy controls in the AA samples; (B) Patients with complicated sarcoidosis versus patients with uncomplicated sarcoidosis in the AA samples; (C) All sarcoidosis patients versus healthy controls in the EA samples; and (D) Patients with complicated sarcoidosis versus patients with uncomplicated sarcoidosis in the EA samples.(PDF) Figure S3 Comparison between the 20-gene signature and the TCR/JS/CCR signaling pathway gene signature in individual populations.The distribution of accuracy is based on 1,000 times of five-fold cross-validation.The dashed lines indicate the average classification accuracy for the 20-gene signature or the TCR/JS/CCR signaling pathway gene signature.HC: healthy controls; US: patients with uncomplicated sarcoidosis; and CS: patients with complicated sarcoidosis.(PDF) Figure S4 Capability of the the 20-gene signature and the TCR/JS/CCR signaling pathway gene signature in separating sarcoidosis patients from IPF patients.The distribution of accuracy is based on 1,000 times of five-fold crossvalidation.The dashed lines indicate the average classification accuracy for the 20-gene signature or the TCR/JS/CCR signaling pathway gene signature.(PDF)

Figure 3 .
Figure 3. Boxplot of expression of the 20 signature genes.The dark grey points and lines indicate the geometric mean of expression in each category.HC: healthy controls; US: patients with uncomplicated sarcoidosis; and CS: patients with complicated sarcoidosis.Y-axis: log 2 -transformed expression values.doi:10.1371/journal.pone.0044818.g003

Table 1 .
, a 31 gene Study subjects with racial and complication status.

Table 2 .
Patient characteristics and concomitant medications.

Table 3 .
The unbiased 20-gene signature for complicated sarcoidosis.
Here, the weight of each gene represents the frequency of the gene being selected during the last round of RFE procedure.doi:10.1371/journal.pone.0044818.t003

Table 6 .
PubMatrix search results for the 20-gene signature against sarcoidosis-related search terms.