A systems genomics approach uncovers molecular associates of RSV severity

Respiratory syncytial virus (RSV) infection results in millions of hospitalizations and thousands of deaths each year. Variations in the adaptive and innate immune response appear to be associated with RSV severity. To investigate the host response to RSV infection in infants, we performed a systems-level study of RSV pathophysiology, incorporating high-throughput measurements of the peripheral innate and adaptive immune systems and the airway epithelium and microbiota. We implemented a novel multi-omic data integration method based on multilayered principal component analysis, penalized regression, and feature weight back-propagation, which enabled us to identify cellular pathways associated with RSV severity. In both airway and immune cells, we found an association between RSV severity and activation of pathways controlling Th17 and acute phase response signaling, as well as inhibition of B cell receptor signaling. Dysregulation of both the humoral and mucosal response to RSV may play a critical role in determining illness severity.


Introduction
Respiratory syncytial virus (RSV), a negative strand RNA virus in the Pneumoviridae family, is a major cause of respiratory illness affecting persons of all ages, especially newborn infants [1][2][3]. Although the majority of infections are relatively mild, RSV remains the most common cause of hospitalization for pneumonia and severe pneumonia in infants and young children in both the developed and developing world [4][5][6]. In the US half of newborns are infected in their first winter, with 1-3% hospitalized, 4-7% seen in emergency departments, and 10-16% seen in physician offices [7].
A number of well-defined host factors predisposing infants to severe disease include prematurity, congenital cardiac and neuromuscular disease, and low levels of maternally derived neutralizing antibody [3,8]. More recent studies have also found genetic polymorphisms in cytokine and chemokine genes, altered innate interferon responses in the respiratory tract, T cell responses favoring a Th2 and Th17 bias, and the composition of the nasal microbiota to be associated with more severe illness [8][9][10][11][12][13][14][15]. Although each of these factors offer insight into the complex nature of RSV infection in young infants, they have generally been analyzed independently; thus, it is difficult to assess their interactions and relative importance to disease outcome.
Previous multi-omic analyses of RSV, by our group and others, demonstrated the potential of integrative analyses to further our understanding of the biological mechanisms underlying RSV disease progression and severity. To address limitations of prior studies, we designed a systems-level study of RSV pathophysiology in a precisely defined population of low risk newborns with the full spectrum of disease severity [16]. We studied purified populations of CD4 +, CD8+ and CD19+ cells, as mixed PBMCs have been used extensively in studies of illness severity for respiratory infections [9,11,[17][18][19]. We reasoned that the local airway response would be a key component to defining illness severity, by contributing to the ability of the host to control or localize the infection [20]. We also included data from the nasal microbiota, as recent studies have indicated colonization at the time of viral infection may significantly influence illness severity [9].
This study builds on these previous studies both in the scope of the data and in the methodology developed to analyze these data. By modeling the connections between these highthroughput data and clinical RSV severity, we are able to reconstruct the intricate relationships among different data types and demonstrate the potential of integrative analyses to identify shared and cell type specific cellular pathways associated with RSV severity.

Results
The data presented in this manuscript were generated as part of the Assessing Prediction of Infant Respiratory Syncytial Virus Effects and Severity (AsPIRES) study, which sought to identify host, viral, and environmental factors associated with RSV disease severity [21]. A total of 139 infants with RT-PCR confirmed RSV illness were enrolled, of which 134 had omics data of sufficient quality for analysis (demographic information for this cohort is shown in Table A in S1 Text). Venous blood, nasal microbiota and nasal epithelial cell samples were collected for high-throughput molecular analysis (Fig 1). Illness severity was measured using the Global Respiratory Severity Score (GRSS) [22], which quantifies the full spectrum of primary RSV disease severity using nine clinical variables in a weighted score. We employ a novel approach to the integration and analysis of five high dimensional omic data modalities: the nasal epithelial transcriptome, the transcriptome of CD4, CD8, and CD19 cells from peripheral blood, and the nasal microbiome.

Integrated method development
Preliminary exploratory analyses of individual data types found that most individual features (e.g. a single gene measured in CD4 cells) have relatively weak correlation with GRSS (median absolute correlation 0.08-0.11). Additionally, we observed strong correlation among features within a data type, which are typical of transcriptomic data. These two observations motivated us to use Principal Components Analysis (PCA) to aggregate numerous "weak features" in the hopes of identifying a few key latent factors for each data type. Screeplots from performing PCA on each data type showed that a small number of Principal Components (PCs) explain the vast majority of the variation in each data type (S1 Fig). This further supports our observation of strong correlation among features within each data type. Furthermore, we applied a secondary PCA on the PCs produced by each data type and found shared variation between data types. Specifically, secondary PCs contained primary PCs from different data types, and the number of secondary PCs was far less than the number of primary PCs. These initial analyses motivated the methods of data integration we describe in this paper.
Due to practical limitations, only a small subset of subjects (23 out of 106) have all five high dimensional data types (S2 Fig). Consequently, we decided to conduct integrative analyses of several combinations of omics data. The first set of data modalities was chosen to interrogate three disparate putative determinants of RSV severity, the nasal epithelial transcriptome (NT), the nasal microbiome (NM), and the adaptive immune response measured in peripheral blood CD4 cells (CD4). These data were available for 61 subjects. The second data integration focused on the collective role of the adaptive and innate immune response in RSV severity, measured in CD4, CD8, and CD19 cells isolated from peripheral blood. These data were available for 35 subjects. Demographic information for these sub-cohorts is shown in Tables B & C in S1 Text.
We proposed five related integrative analytic methods based on multilayer PCA and regularized regression based variable selection (Fig 2), see Materials and Methods for details. Based on extensive cross-validation (CV) experiments, we found that PCA of transcriptomic data, followed by an integrative elastic-net regression model with the transcriptomic PCs and individual operational taxonomic units (OTUs) of nasal microbiome data (Fig 2, Method 1) achieved better GRSS prediction accuracy than the other methods (Table D in S1 Text).
We also applied a similar approach, initial PCA-based dimension reduction followed by regularized regression, to individual data types and found that the integrative models significantly out-performed the single data type models in terms of cross-validated prediction accuracy (Fig 3). Specifically, while the integrated and single data type models are all approximately unbiased, the integrated models had substantially smaller mean squared error (MSE).

Model application and interpretation
We previously reported gene expression correlates of clinical disease severity in RSV infected infants [11,23]. To gain further insight, we generated multiple "models" by integrating unbiased, comprehensive gene expression data from both the humoral and mucosal compartments as described above. We hypothesized that a "systems level" analytical approach would provide distinct biological insights into disease pathophysiology. We first focused upon humoral responses specifically characterized in CD4 T cells sorted from peripheral blood collected during acute illness, in two models containing gene expression data from these cells (Fig 4). Following data integration, using RSV-associated disease severity as the outcome, the modeled weights (see "Feature weight calculation" in S1 Text for more details) for expression of individual genes is displayed in word clouds (upper left), and unweighted gene expression values are displayed in a heatmap (lower left). Weights for individual genes are clearly different between the two models, as evident from the word clouds, and could be expected due to the different subsets of the cohort included in each model. Interestingly, CD101, one of the highest weighted (absolute value 0.0048) genes in both models, plays a role as an inhibitor of T cell proliferation induced by CD3. Furthermore, unweighted gene expression in these integrated models are not fundamentally different between mild and severe subjects. These observations support the novel and distinct insight derived from our new integrated modeling approach to identify gene expression correlates of disease severity. The panel on the left shows the difference between the predicted and observed GRSS for a model that integrates the nasal epithelial transcriptome, peripheral blood CD4 transcriptome, and nasal microbiome, as well as for models using just one of these data types. Similarly, the panel on the right compares a model that integrates the transcriptomes of 3 immune cell types measured in peripheral blood with models using just one of these data types. In both cases, integration increases the precision of the GRSS predictions.

PLOS COMPUTATIONAL BIOLOGY
To minimize model-specific variation, we focused our interpretation of these data at the pathway level (middle of Figs 4 and S3-S12). We found a high degree of model convergence, with a majority (95%) of pathways consistently, significantly activated or inhibited (pvalue < 0.05) in association with disease severity. Significantly activated pathways were associated with hypoxia signaling, nucleotide salvage and telomerase signaling pathways. Of particular interest, many significantly inhibited (p-value < 0.05) pathways were associated with T helper (Th) cell signaling (CD28 and NFAT signaling) and differentiation of Th subtypes (Th1 and TH17 regulation).
We next attempted to predict key regulatory events associated with disease severity, including transcription factors and intracellular signal transduction molecules, that could be driving the global gene expression responses. Upstream regulator analysis using Ingenuity suggested that MYC and PI3K were associated with many of the significantly regulated (p-value < 0.05) pathways (Fig 4). Four sets of genes were selected based on their model weights: the top 200 genes, genes in the first quantile, genes in the second quantile, and all genes. Promoter analysis of these three sets were performed, using high quality transcription factor binding sites conserved across human, mice and rat genomes. The analysis revealed that multiple T cell differentiation pathways with consistent evidence for inhibition (CD28 signaling, Th17 activation, Th1 and T cell exhaustion signaling) are under the transcriptional control of PRDM1 (Fig 4  upper right). Interestingly, PRDM1 is known to modulate peripheral T cell activation and proliferation, promote T helper (Th2) lineage commitment and limit Th1/Th17/Tfh cell differentiation. Our results implicate inhibition of T cell differentiation, towards the Th2/Th17 phenotype in particular, as a putative mediator of severe illness.
We rationalized that the mucosal response would be distinct from the humoral response and would reflect the pathophysiology of the disease target organ, as suggested by our prior PLOS COMPUTATIONAL BIOLOGY studies [11,24]. Therefore, we next focused on interpreting the gene weights of nasal samples from the comprehensive model (CM). Similar to the CD4 data presented in Fig  Highly weighted genes (absolute value greater than 0.0003) that were positively correlated with disease severity include BTN2A2, which inhibits the proliferation of CD4 and CD8 T-cells, T-cell metabolism and IFN secretion, and HOMER1, which negatively regulates T cell activation by inhibiting NFAT pathway.
We next completed pathway level analysis of the comprehensive model (CM) nasal gene expression weights (Figs 5, center and S3-S12). The analysis uncovered a decrease in multiple pathways driven by both retinoic acid-related (e.g., LXR/RXR and PPAR/RXR signaling) and p53-related (e.g., NF-kappaB) signaling were associated with severe disease. Further analysis of these genes and pathways identified significant evidence (p-value < 0.05) for changes in RXR and REL/A transcription factor regulation ( Fig 5, upper right). Interestingly, this analysis indicated increased activation of pathways which were focused on regulation of the immune system were also associated with severe disease; in particular those associated with Th2 and Th17 CD4 T cells. Remarkably, upstream regulator analysis suggested this may be due to increases in the expression of IL4 and IL17A (Fig 5, lower right). Again, as for CD4 T cell data, our ability to use this integrated modeling approach to identify evidence for pathophysiologically relevant interactions between the mucosal and humoral responses supports its methodological validity.
Finally, we assessed unique and consistent responses in the mucosal and humoral compartments as indicated by our integrated models (CM and LM). Pathway-based analysis of the

PLOS COMPUTATIONAL BIOLOGY
weights derived from CD8 T cells indicated unique activation of cytotoxic responses including those related to classical CD8 T cell functions were associated with severe disease. CD8 T cells also demonstrated unique activation of TGFb and TNFR signaling in severe disease. Conversely, analysis of the weights derived from CD19 B cells indicated regulation of multiple, alternate pathways. CD19 B cells displayed evidence for unique activation of PLC, and unique inhibition of PI3K/AKT signaling, among others. Pathway-based analysis of the weights also identified a number of responses that were conserved across all lymphocytes (e.g., CD4, CD8, CD19) and associated with disease severity. Consistently activated pathways indicated broad increases in oxidative phosphorylation, nucleotide salvage and sumoylation. Significantly inhibited pathways indicated broad reductions in lymphocyte proliferation, activation (e.g., iCOS and NFAT) and surtuin signaling. Finally, we looked for pathways which were consistently identified not just in lymphocytes, but across all humoral and mucosal data sets (Fig 6). This analysis indicated activation of pathways controlling Th17 and acute phase response signaling, as well as consistent inhibition of B cell receptor signaling, are consistently associated with disease severity in all cell types studied.
Two alternative integration models were also considered in our study: one integrating CD4 and CD19 lymphocytes (S13-S16 Figs) and another integrating nasal epithelial cells with CD4 and CD19 lymphocytes (S17-S22 Figs). These additional models produced similar pathway level results for each transcriptomic data type, suggesting a degree of robustness to our approach to data integration and analysis and support our decision to focus on a comprehensive model of nasal epithelial cells, nasal microbiota, and CD4 lymphocytes (CM) and a model focused on three lymphocyte cell types (LM).

Discussion
In summary, we conducted a multi-omics study on infant RSV infection. We demonstrated that a multi-layer statistical learning framework was better at predicting disease severity than

PLOS COMPUTATIONAL BIOLOGY
comparable single-layer approaches; and integrating multiple omics datasets provided with us better prediction accuracy of disease severity than predictors built from any single dataset. In addition, based on the trained sparse linear predictors, we were able to assess the contribution of individual genes/microbes by quantitative weights, which facilitate the biological interpretations of these predictive models.
Previous studies, including our own, utilized transcriptomic analysis of single sample types such as whole blood, purified T cells, or nasal secretions to investigate RSV disease pathogenesis. These analyses found an association between increased disease severity and both T and B cell suppression, evidence of systemic Th2 skewed T cell responses, alterations in systemic and local interferon responses, and the potential influence of the local microbiome on these responses. In the current work, by integrating gene expression data from multiple cell types from the peripheral blood and the respiratory epithelium with the complex microbiome of the upper respiratory tract, we confirmed the deregulation of the immune profile associated with RSV disease severity. Specifically, the results demonstrated that Th2 and Th17 activation, and inhibition of Th1 pathways dominate the T cell response. In addition, there was evidence of B cell suppression in the airway of infants with severe RSV. The results also demonstrated that inclusion of the microbiome, specifically Haemophilus influenzae, was informative for understanding a complete picture of RSV disease pathogenesis in young infants. Because our microbiome analysis did not identify Streptococcus pneumoniae, we were unable to confirm its influence. As described in the results, our analysis identified many severity-associated pathways that were activated or suppressed during infection, especially those indicating immune suppression. As shown in Fig 5, the airway cells contributed importantly to this inhibition. Airway epithelial cell expression identified BTN2A2 as an influential gene and was important in inhibiting CD4 and CD8 cells and interferon suppression.
There are several notable limitations to this study. First, the sample size was limited due to the complex nature of the study, challenges enrolling infants with RSV, and the need for multiple visits to obtain samples. In the pathway and transcription factors analyses, we assessed statistical significance using a p-value threshold of 0.05 without adjustment for multiple testing; therefore, these results should not be viewed as definitive. The same factors that contributed to the small sample size also contributed to a second limitation: the potentially non-random missingness of certain omic data types for some participants. However, we believe that this latter limitation represents an opportunity for future methods development in multi-omic data integration. Second, a common limitation of clinical genomic studies is that sample are collected over time, which has the potential to introduce technical variation. Third, samples were collected at specific time points; therefore, it is possible that some changes in cellular function that were not temporally proximal were not captured.
Due to the high-throughput nature of omics data, there could be thousands of features that are potentially associative with biological endpoints. Besides, there exist an astronomically large number of possible interdependences among those omics data. Consequently, we believe that the most fundamental challenge in multi-omics studies such as ours is to reduce the complexity in statistical models in a sensible way, such that most spurious correlations are removed yet the major modes of informative relationships are retained.
Last but not the least, we believe our proposed method has better interpretability than those machine learning algorithms based on deep neural networks. In fact, the two methodologies share one architectural similarity, namely, both are multi-layered feedforward systems (see Fig  2). Due to the deliberate choice of using linear activation function (via PCA) and linear output function (via glmnet regression), we were able to design a multilayer backpropagation algorithm that translates the weights at the output layer, i.e., linear coefficients in the trained predictor of GRSS, to mathematically equivalent weights at the input layer, i.e., individual genes and microbes. In a sense, the absolute value of a particular weight represents the contribution of this gene/microbe in predicting the GRSS. We performed gene set enrichment analyses based on these weights and discovered both cell-type specific and shared severity-associated biological pathways. In contrast, it is generally not possible to obtain such a direct relationship between the output and input layer in feedforward neural networks with nonlinear activation functions, which is why these algorithms are sometimes referred to as "blackbox methods".
Arguably, several components in our integrative analysis pipeline may be replaced by other more specialized methods. For example, instead of using standard PCA, we could consider probabilistic PCA [25] or sparse PCA [26]. The former is more robust to outliers and missing values due to the use of a ridge-like regularization term; and the latter is not only more consistent for "large p, small n" data than the standard PCA, it also has better interpretability because it can produce sparse loading scores. In addition, there is a host of recently developed variants of PCA that are advantageous in various situations, as summarized in [27].
Notwithstanding their advantages, these advanced dimension reduction methods are: (a) computationally more demanding; and (b) dependent on the tuning parameter that may have to be selected by cross-validation, which adds more computing cost and uncertainties in the analyses. In the future, we plan to systematically evaluate the impact of different dimension reduction strategies to the second stage integrative analyses and make algorithmic adaptations to improve their computational efficiency in high-throughput data analysis if necessary.
Another potentially rewarding future research direction is to design better statistical methods optimized for "incompletely matched" data like ours, in which all subjects do not have all types of data. In fact, as seen in S2 Fig, only 23 subjects have all five types of high-throughput data, thus we decided to build integrative models for several combinations of data separately. Development of efficient and unbiased high-dimensional imputation methods in the future may allow us to fully integrate all available data and improve the accuracy of the predictive models.
What we present in this paper is a holistic approach to understanding the response to RSV infection during infancy, and the system-level correlates of clinical outcomes. The data provide a better resolution for accurately and sensitively identifying molecular changes associated with illness severity, and also uncover specific and robust changes that may be easily detectable using distinct biospecimens (e.g., airway, T cells, B cells, etc.). These novel data can guide future efforts to identify sensitive and specific biomarkers for identifying or predicting outcome following infant RSV infection. They may also be useful as biomarkers to inform the efficacy of future interventions (e.g., therapies) or preventative measures to suppress the rate of severe disease (e.g., vaccines). For example, our approach could potentially be used to quantify the response to novel RSV vaccines, either live attenuated or subunit vaccines, to be certain they do not mimic responses that may lead to a pathologic state [28,29].

Ethics statement
The Research Subject Review Board of the University of Rochester and Rochester General Hospital approved the study (IRB#42632), and all parents provided written informed consent.

Study population and sample collection
The full description of the AsPIRES study has been published [21]. The Research Subject Review Board approved the study and all families provided informed consent. Briefly, three groups of previously healthy term infants with primary RSV infection were evaluated during three winter seasons from 2012 to 2015 in Rochester, NY. RSV infection was identified in hospitalized infants, outpatients brought to medical attention either at the Emergency Department or primary care offices for respiratory symptoms, and infants in a birth cohort followed prospectively in their homes for RSV infection. From these three groups, infants with a range of RSV disease severity were included.
Infants were evaluated at three time points: an acute illness visit at diagnosis, a second visit 14 days after illness onset, and a convalescent visit 28 days after illness onset. At each visit symptoms were recorded, and a physical exam was performed. At the first and third visit a flocked swab (Copan) was used to obtain a nasal specimen from one nares nares for microbiological testing, a nasal wash was performed on the contralateral nares to remove mucus and debris followed by the use of a second flocked swab to obtain epithelial cells by brushing the mucosa at the level of the inferior turbinate. Venous blood (~2-3 ml) was collected at each visit. RSV infections were confirmed by quantitative reverse transcriptase polymerase chain reaction (qRT-PCR) assay at diagnosis.
RSV disease severity was measured using the Global Respiratory Severity Score (GRSS), calculated using nine weighted clinical variables (general appearance; presence of wheezing, rales, retractions, cyanosis, lethargy, or poor air movement; maximal age-adjusted respiratory rate; and worst room-air oxygen saturation) yielding a score of 1 through 10 [22].

RNA processing
Four types of RNA-seq data were used in this study: NT (nasal transcriptome), CD4, CD8, and CD19. Technical details for recovering nasal RNA can be found in [30]. Briefly, following flushing of the nares with saline to remove mucus and cellular debris, a flocked swab was used to recover cells at the level of the turbinates. The swab was immediately placed in RNA stabilizer (RNAprotect, Qiagen, Germantown, MD) and stored at 4˚C. Cells were recovered by filtering through a 0.45 uM membrane filter. Recovered cells were lysed and homogenized using the AbsolutelyRNA Miniprep kit (Agilent, Santa Clara, CA) according to the manufacturer's instructions.
CD4, CD8, and CD19 were mRNA expression profiles of the corresponding cell populations purified from peripheral blood as previously described [11,31]. Specifically, heparinized blood was maintained at room temperature for up to 2 hours, and peripheral blood mononuclear cells were isolated by Ficoll-hypaque gradient, flow-sorted into these three subsets of cells, and stored in RNA lysis buffer at −20˚C.
For all four types of RNA-seq data, sequencing libraries were constructed using the Nexter-aXT library kit (Illumina, San Diego, CA) and then sequenced on the Illumina HiSeq2500 platform. Sequences were aligned against human genome version of hg19 using STARv2.5, counted with HTSeq, and normalized by Fragments Per Kilobase of transcript per Million mapped reads (FPKM). A small subset of samples with very low yields or very low correlation with other samples were removed from the subsequent analyses (Table E in S1 Text), and we applied non-specific filtering based on both mean expression values and inter-quartile range (IQR) to identify subsets of genes for further investigation. We winsorized potential outliers at the gene-level, and then tested the correlation between gene expressions and the GRSS by Pearson correlation test. For each type of transcriptomic data, we were able to select several hundreds of potentially informative features at 0.05 significance level, as summarized in Table F in S1 Text. Additional technical details on data preprocessing can be found in S1 Text.

Microbiome processing
Bacterial 16 S rRNA from nasal swab specimens was extracted, amplified, and sequenced, and the resulting data were used to determine the taxonomic compositions, in terms of the relative abundances of those present operational taxonomic units (OTUs). Briefly, the V3-V4 hypervariable regions were targeted for amplification and sequenced using an Illumina MiSeq platform according to a paired end 2 × 300 bp read protocol. Preliminary read processing and quality control were performed using the Quantitative Insights into Microbial Ecology (QIIME) software package [32,33], and a closed-reference OTU picking was done with USEARCH and the GreenGenes reference database [34]. The initial microbiome data contained information for 148 distinct OTUs at the genus-level across 104 samples. Among them, only 15 genera had nonzero abundance level for more than half (n = 52) of the subjects. These features were selected for the integrative analyses. A full list of them is provided in Table G in S1 Text.

Methods of data integration
We considered five methods of data integration, all of which shared three common components: a uni-or multi-layered dimension reduction to select a manageable set of features from various types of data, and elastic-net regularized regression to integrate these features into one weighted score to predict GRSS, our main outcome variable. Elastic-net regularized regression uses both L 1 (LASSO) and L 2 (ridge) penalties to produce a sparse linear predictive model and is known to be numerically stable for high-dimensional data. It is implemented by the R package glmnet [35]. Regularization parameters were selected by an initial ten-fold cross-validation. After we obtained a sparse regression model, we re-estimated the linear coefficients by OLS-based procedures to improve the accuracy of modeling fitting. This parameter refinement strategy can improve predictive accuracy and is widely used in high-throughput data analysis [36].
Method 1 performed principal components analysis (PCA) separately on each transcriptomic data set and then uses the resulting PCs together with OTUs representing the nasal microbiota in a penalized regression model of GRSS. Methods 2 performed PCA collectively on all data types and then used the resulting PCs in a penalized regression model of GRSS. Method 3 was identical to Method 1 except for the addition of a second layer of PCA prior to a penalized regression model of GRSS. Method 4 was identical to Method 1 except the nasal microbiota OTUs also undergo dimension reduction via PCA. Finally, Method 5 combined the additions of Methods 3 & 4 resulting in a full two-layer PCA. We assessed the ability of the models produced by each of these methods to predict GRSS through leave-one-out cross-validation.

Weight assignment and transcription factor analysis
By design, the estimated linear coefficients in our integrative models are weights that represent the importance of principal components, not the features in the original data such as genes and microbes. To enhance the interpretability of these integrative models, we calculated weights for each of the original features based on a backpropagation algorithm. Details of this calculation can be found in "Feature weight calculation" the S1 Text. These weights were then used in the transcription factor analyses. As previously described, conserved binding sites from JASPER across hg19, mm10 and rn6 were identified and were mapped to 2 kb region around Transcription Start Site (TSS) for the transcription factor analysis [37]. A hypergeometric test was performed to identify enriched binding sites [38,39]. Statistical significance was assessed at a p-value threshold of 0.05 without adjustment for multiple testing.

Pathway analysis
Genes that were identified as significantly correlated with GRSS were subsequently used for canonical pathway identification and upstream regulator analysis using Ingenuity Pathway Analysis (QIAGEN Silicon Valley, Redwood City, CAQiagen). The combined feature weights were used in pathway analysis to enhance the results and were compared with the results obtained without those weights. Statistical significance was assessed at a p-value threshold of 0.05 without adjustment for multiple testing.