Transcriptomic meta-analysis reveals up-regulation of gene expression functional in osteoclast differentiation in human septic shock

Septic shock is a major medical problem with high morbidity and mortality and incompletely understood biology. Integration of multiple data sets into a single analysis framework empowers discovery of new knowledge about the condition that may have been missed by individual analysis of each of these datasets. Electronic search was performed on medical literature and gene expression databases for selection of transcriptomic studies done in circulating leukocytes from human subjects suffering from septic shock. Gene-level meta-analysis was conducted on the six selected studies to identify the genes consistently differentially expressed in septic shock. This was followed by pathway-level analysis using three different algorithms (ORA, GSEA, SPIA). The identified up-regulated pathway, Osteoclast differentiation pathway (hsa04380) was validated in two independent cohorts. Of the pathway, 25 key genes were selected that serve as an expression signature of Septic Shock.


Rationale
3 Describe the rationale for the review in the context of what is already known.
Genome-wide expression profiling offers a detailed picture of the condition and enables identification of genes and pathways of diagnostic, prognostic or therapeutic relevance. There have been a number of studies investigating gene expression in sepsis and septic shock leading to very interesting discoveries, such as altered zinc metabolism in sepsis, and clinically relevant grouping of cases of septic shock. The common goal of these previous analyses was to detect interesting genes associated with sepsis or septic shock. Gene-level analysis is inherently not geared toward detection of pathways, which are sometimes modulated even in the absence of significant gene-level changes in expression. All queries were made on the 3rd January 2017.

2
Information sources 7 Describe all information sources (e.g., databases with dates of coverage, contact with study authors to identify additional studies) in the search and date last searched.
(1) National Centre for Biotechnology Information Gene Expression Omnibus (GEO) and (2)  Study selection 9 State the process for selecting studies (i.e., screening, eligibility, included in systematic review, and, if applicable, included in the meta-analysis).
Selection of studies was based on the organism (human subjects), tissue of origin (circulating leukocytes from whole blood samples) and the platform technology (gene expression microarray). Only data sets published as full reports were selected.
2 Data collection process 10 Describe method of data extraction from reports (e.g., piloted forms, independently, in duplicate) and any processes for obtaining and confirming data from investigators.
Normalized gene expression data from the series matrix files were downloaded from GEO.
2 Data items 11 List and define all variables for which data were sought (e.g., PICOS, funding sources) and any assumptions and simplifications made.
Only genes common to all studies were included in the analysis. 2

Risk of bias in individual studies
12 Describe methods used for assessing risk of bias of individual studies (including specification of whether this was done at the study or outcome level), and how this information is to be used in any data synthesis.
It is to be noted that we did not check for 'publication bias' as this kind of bias is generally meaningful for candidate gene studies, as opposed to omics studies. (It is highly unlikely for a transcriptomic study to not detect any Differentially Expressed gene and remain unpublished). Additionally, had we tried to pool the transcriptomic data, many potential biases would creep in. Therefore, we meta-analyzed the summary t-statistics, allowing us to avoid such biases. Of note, the data within each study was already available as normalized data to remove intra-study biases. While it is still theoretically possible for some genes to behave heterogeneously between studies, it would be biologically rare to occur within the same tissue (whole blood in this case).
All the studies chosen had most subjects of Caucasian ethnicity making inter-study heterogeneity in gene-expression less likely to appear. We chose our initial phenotype (Septic Shock) to be as homogenous as possible. This reduced our number of studies for our meta-analysis to only 7. Hence accounting for inter-study heterogeneous behavior of genes (e.g. random effects models) would reduce our statistical power without significant benefits.
Since we have started with normalized omics-data, it is expected that there has been correction of bias. For all genes within each study, we applied a Welch's 2-sample t-test to compare expression values in samples of SS with that in samples of control subjects. A single overall p-value was then computed for each gene by metaanalyzing individual study level t-statistics, as described below. We first transformed each individual t-statistic to a Zscore Z_i retaining its sign by using the quantile transformation. We then followed a fixed effects meta-analysis approach, where the Z-scores are combined using an optimal linear combination with weights equal to square root of the effective sample size of each study (i.e., the harmonic mean of case and control sample sizes). Since some of the studies have overlapping cases and controls as shown in Supplementary Table S1, we derived the correlation matrix among the Z-scores of the 6 studies using a previously known correlation formula for case-control studies . The variance of the linear combination is thus derived as the sum of variances of the numerator terms added to their pairwise covariances based on the inter-study correlations R_ij . This gave the combined meta-analyzed Z-score Z_meta .

Section/topic # Checklist item Reported on page #
Risk of bias across studies 15 Specify any assessment of risk of bias that may affect the cumulative evidence (e.g., publication bias, selective reporting within studies).
Since we have started with normalized omics-data, it is expected that there has been correction of bias.
It is to be noted that we did not check for 'publication bias' as this kind of bias is generally meaningful for candidate gene studies, as opposed to omics studies. (It is highly unlikely for a transcriptomic study to not detect any Differentially Expressed gene and remain unpublished). Additionally, had we tried to pool the transcriptomic data, many potential biases would creep in. Therefore, we meta-analyzed the summary t-statistics, allowing us to avoid such biases. Of note, the data within each study was already available as normalized data to remove intra-study biases. While it is still theoretically possible for some genes to behave heterogeneously between studies, it would be biologically rare to occur within the same tissue (whole blood in this case).
All the studies chosen had most subjects of Caucasian ethnicity making inter-study heterogeneity in gene-expression less likely to appear. We chose our initial phenotype 8 (Septic Shock) to be as homogenous as possible. This reduced our number of studies for our meta-analysis to only 7. Hence accounting for inter-study heterogeneous behavior of genes (e.g. random effects models) would reduce our statistical power without significant benefits.
Since we have started with normalized omics-data, it is expected that there has been correction of bias.
Additional analyses 16 Describe methods of additional analyses (e.g., sensitivity or subgroup analyses, meta-regression), if done, indicating which were pre-specified.

RESULTS
Study selection 17 Give numbers of studies screened, assessed for eligibility, and included in the review, with reasons for exclusions at each stage, ideally with a flow diagram.

Figure 1
Study characteristics 18 For each study, present characteristics for which data were extracted (e.g., study size, PICOS, follow-up period) and provide the citations.
PMIDS of the studies included:  Risk of bias within studies 19 Present data on risk of bias of each study and, if available, any outcome level assessment (see item 12). We have conducted on omics-data that have already been normalized (with internal correction for bias).

GSE IDs PMIDs
8 Results of individual studies 20 For all outcomes considered (benefits or harms), present, for each study: (a) simple summary data for each intervention group (b) effect estimates and confidence intervals, ideally with a forest plot. The goal of the study was to detect pathways that are consistently up-regulated in septic shock.

DISCUSSION
Summary of evidence 24 Summarize the main findings including the strength of evidence for each main outcome; consider their relevance to key groups (e.g., healthcare providers, users, and policy makers).
Firstly, systematic analysis of multiple data sets enabled identification of the core set of genes that are consistently up-regulated in SS. Secondly, we applied three different methods (Over-Representation Analysis, Gene Set Enrichment Analysis and Signaling Pathway Impact Analysis) to arrive at the Osteoclast differentiation pathway with consistent and significant up-regulation in SS. We validated this result with additional bioinformatic analysis and pathway gene expression assay in two independent validation data sets. Lastly, we identified 25 genes that may serve as an expression signature of SS.
In light of this finding, altered osteoclast differentiation in septic shock deserves greater attention.

9
Limitations 25 Discuss limitations at study and outcome level (e.g., risk of bias), and at review-level (e.g., incomplete retrieval of identified research, reporting bias).
This study has a limitation in terms of heterogeneous phenotypes between the discovery and validation cohorts, such as age, and organ involvement. SS is a severe form of disease where the multiorgan involvment and host response is usually similar 8 PRISMA 2009 Checklist even among different age groups. We think that this up-regulation of the osteoclast differentiation pathway is a stable signature of SS inspite of the phenotypic heterogeneity.
Conclusions 26 Provide a general interpretation of the results in the context of other evidence, and implications for future research.
We have identified a set of genes whose expression in septic shock may serve as a signature. In light of this finding, altered osteoclast differentiation in septic shock deserves greater attention.

FUNDING
Funding 27 Describe sources of funding for the systematic review and other support (e.g., supply of data); role of funders for the systematic review. For more information, visit: www.prisma -statement.org.