^{1}

^{*}

^{2}

^{3}

^{4}

Conceived and designed the experiments: LW BZ RW. Performed the experiments: LW. Analyzed the data: LW BZ XC. Wrote the paper: LW BZ RW XC.

The authors have declared that no competing interests exist.

Gene class, ontology, or pathway testing analysis has become increasingly popular in microarray data analysis. Such approaches allow the integration of gene annotation databases, such as Gene Ontology and KEGG Pathway, to formally test for subtle but coordinated changes at a system level. Higher power in gene class testing is gained by combining weak signals from a number of individual genes in each pathway. We propose an alternative approach for gene-class testing based on mixed models, a class of statistical models that:

In microarray data analysis, when statistical testing is applied to each gene individually, one is often left with too many significant genes that are difficult to interpret or too few genes after a multiple comparison adjustment. Gene-class, or pathway-level testing, integrates gene annotation data such as Gene Ontology and tests for coordinated changes at the system level. These approaches can both increase power for detecting differential expression and allow for better understanding of the underlying biological processes associated with variations in outcome. We propose an alternative pathway analysis method based on mixed models, and show this method provides useful inferences beyond those available in currently popular methods, with improved power and the ability to handle complex experimental designs.

To help increase power to detect microarray differential expression and to better interpret findings, gene-class testing or pathway analysis has become increasingly popular

The most commonly used approach for pathway analysis, the enrichment or overrepresentation analysis, uses Fisher's exact test. This method starts with a list of differentially expressed genes based on an arbitrary cutoff of nominal p-values, and compares the number of significant genes in the pathway to the rest of the genes to determine if any gene-set is overrepresented in the significant gene list. The Fisher's exact test is implemented in a number of software packages such as GOTM

More recent approaches such as Gene Set Enrichment Analysis (GSEA)

Other examples of permutation- and bootstrap-based methods include SAFE

Some parametric methods and their comparisons with the proposed method are in order. Wolfinger et al.

In

Given two groups of samples and an

We assume reliable numerical values are obtained from gene expression intensities and are on the log2 scale. In single colored arrays, the expression values for each gene are derived from each spot on the array; in two-colored arrays, the expression values for each gene can be the original intensities or the ratios of expression values for experimental sample compared to reference sample. When multiple probe sets for a gene are present, they can be mapped to some standard gene IDs such as the Ensembl Gene IDs (

Next, to homogenize variances for all the genes included in mixed model and to make their means comparable, we standardize values for each gene with control group mean and standard deviations. Specifically, the mean and standard deviation of each gene from control patients are calculated, and all the gene values are standardized by subtracting the control group mean and dividing by the control group standard deviation. The standardized gene expression values then represent the number of standard deviations away from the “normal” gene expression values. In a time course experiment, expression values at baseline can be used similarly as control group data to standardize all measurements in the time course.

Linear mixed models is a class of statistical models that handles data where observations are not independent, such as gene expression values from the same array. They include both fixed effects and random effects, and thus are called mixed effect models. The fixed effects model the systematic effects or the mean structure of data, and the random effects account for complex covariance structure of observations, such as those between genes. In addition, they also allow inferences to be made to the entire population of samples from which the observed samples arise.

Assuming after data pre-processing, there are one measurement per gene from each array, we propose the following basic linear mixed models for comparing differential expression pattern in the pathway (or gene-set) _{jk}

While _{jk}_{l}_{m}_{(g)} for _{gjklm}^{2}). Parameters from the mixed model are estimated using the method of restricted maximum likelihood (REML) along with appropriate standard errors.

The hypothesis we are testing is whether the amount of differential expression between cases and controls for gene-set genes are significantly different from the other genes. This is essentially the interaction effect between gene-set and group. In terms of Model 1, we want to test _{0}:(_{11}−_{10})−(_{01}−_{00}) = 0. Here, _{11}−_{10} represents differential expression for genes in the pathway and _{01}−_{00} represents differential expression for the rest of the genes.

In feedback or reverse regulation, in response to an input signal, genes in a gene-set may shift in both directions, that is, a fraction of gene-set genes are up-regulated and another fraction of gene-set genes are down-regulated, then testing changes in the entire gene-set will not be effective as the changes in different directions will cancel each other out. Instead, we propose modeling reverse regulation with

Because the direction of change _{0}:{[(_{11}−_{10})−(_{01}−_{00})|_{11}−_{10})−(_{01}−_{00})|

Generate gene expression values for

For each null gene-set, fit Model 2 to data and calculate t-statistics _{D}

Consider t-statistics for all null gene-sets, let _{D}_{+} = _{D}_{D}_{D}_{D}_{+}≥0. The Box-Cox transformation of

With estimated _{D}_{+,TEST} is calculated by subtracting minimum from t-statistics of all gene-sets to be tested. The p-value for a particular gene-set

We use the Monte Carlo simulation approach

Once we obtain nominal p-values from steps described above, we next calculate adjusted p-values to control for False Discovery Rate (FDR). An adjusted p-value of 0.05 for a gene set indicates that among all significant gene sets selected at this threshold, 5 out 100 of them are expected to be false leads.

In Models 1 and 2, we assume normal distributions for the random effects:

Another important advantage of random effects is that they help capture the heterogeneous covariations across genes. In particular, the

The _{l}_{g}_{g}_{m}_{(g)} = _{g}_{m}_{(g)} varies depending on _{g}

Using matrix algebra, it can be shown that _{glm}^{2} is residual variance associated with measurement errors and_{g}

We performed a simulation study to assess the sensitivity and specificity of a mixed model approach compared with GSEA and PAGE which also test hypothesis Q1 in Tian et al.

1−

Scene | tot_p | up_p | mu | Mixed Model | GSEA | PAGE |

1 | 0.3 | 0.5 | 0.2 | 0.6158 | 0.5468 | 0.5453 |

2 | 0.3 | 0.5 | 0.4 | 0.9346 | 0.6762 | 0.5852 |

3 | 0.3 | 0.5 | 0.6 | 0.9986 | 0.7349 | 0.6230 |

4 | 0.5 | 0.5 | 0.2 | 0.7735 | 0.7417 | 0.5452 |

5 | 0.5 | 0.5 | 0.4 | 0.9868 | 0.7321 | 0.5851 |

6 | 0.5 | 0.5 | 0.6 | 1.0000 | 0.7373 | 0.6225 |

7 | 0.8 | 0.5 | 0.2 | 0.9106 | 0.7394 | 0.5063 |

8 | 0.8 | 0.5 | 0.4 | 1.0000 | 0.7373 | 0.5064 |

9 | 0.8 | 0.5 | 0.6 | 1.0000 | 0.7373 | 0.5062 |

10 | 0.3 | 0.8 | 0.2 | 0.7074 | 0.6395 | 0.7002 |

11 | 0.3 | 0.8 | 0.4 | 0.8814 | 0.8484 | 0.8755 |

12 | 0.3 | 0.8 | 0.6 | 0.9718 | 0.9710 | 0.9683 |

13 | 0.5 | 0.8 | 0.2 | 0.8472 | 0.7173 | 0.8456 |

14 | 0.5 | 0.8 | 0.4 | 0.9872 | 0.9750 | 0.9888 |

15 | 0.5 | 0.8 | 0.6 | 0.9999 | 0.9957 | 1.0000 |

16 | 0.8 | 0.8 | 0.2 | 0.9551 | 0.8969 | 0.9572 |

17 | 0.8 | 0.8 | 0.4 | 1.0000 | 0.9956 | 1.0000 |

18 | 0.8 | 0.8 | 0.6 | 1.0000 | 0.9964 | 1.0000 |

Therefore, among all the genes in the gene-set, there were 30×

The javaGSEA implementation was used for GSEA analysis and the algorithm described on page 10 of

To compare the performances of Mixed Model 1 with GSEA and PAGE, we generated 20 datasets for each set of parameters

In terms of AUC, when most genes are shifted in one direction (up_p = 0.8), the mixed model and PAGE performed similarly, and they both outperformed GSEA consistently across scenarios 10–18 (

For each scene, there were 20 simulated datasets, each with 1500 genes assigned to 50 gene-sets, among them only the first gene-set (gene-set 1) include genes associated with outcome by design. The test results from each method were compared with true classification of the gene-sets. The AUC measures the ability of a test to correctly classify whether a gene-set is associated with outcome. In scenes 1–9, when genes were shifted in both directions equally (up_p = 0.5), mixed model outperformed both GSEA and PAGE. In scenes 10–18, when most of genes were shifted in one direction (up_p = 0.8), mixed model and PAGE performed similar, and they both outperformed GSEA, especially when the magnitude of differential expression in gene-set 1 is small (scenes 10, 13, 16).

Mootha et al.

Pathway | Nominal p-values | FDR Adj. p-value | ||

Size | GSEA | Mixed Model | Mixed Model | |

OXPHOS_HG_U133A_probes | 114 | 0.003 | 1.40E-12 | 2.11E-10 |

c18_U133_probes | 248 | 0.932 | 4.43E-07 | 3.34E-05 |

human_mitoDB_6_2002_HG_U133A_probes | 594 | 0.091 | 6.97E-06 | 3.51E-04 |

mitochondr_HG_U133A_probes | 615 | 0.087 | 2.03E-05 | 7.68E-04 |

c25_U133_probes | 64 | 0.246 | 9.07E-04 | 0.027 |

MAP00350_Tyrosine_metabolism | 47 | 0.965 | 0.00110 | 0.028 |

c19_U133_probes | 203 | 0.778 | 0.00253 | 0.048 |

MAP00010_Glycolysis_Gluconeogenesis | 91 | 0.759 | 0.00255 | 0.048 |

MAP00500_Starch_and_sucrose_metabolism | 30 | 1 | 0.00294 | 0.049 |

The results show that both the mixed model and GSEA selected the pathway “OXPHOS_HG_U133A_probes” as the most significantly changed pathway and ranked the pathways “human_mitoDB_6_2002_HG_U133A_pro”, “mitochondr_HG_U133A_probes” high on their significant pathways list. While mixed model selected 9 gene-sets at 5% FDR level, all FDR adjusted p-values for GSEA method were greater than 0.2 (the minimum was 0.447). As diabetes is primarily a chronic disorder of carbohydrate metabolism, additional pathways identified by the mixed model, such as the “Glycolysis/Gluconeogenesis” and “Starch and sucrose metabolism” make biological sense. Chronic diabetes has also been associated with changes in “Tyrosine metabolism”

We next applied the mixed model method to a dose-response microarray experiment. West et al.

Our main objective was to identify gene sets with significant monotone changes over doses and to assess whether the changes were similar for the two treatment durations. With permutation based methods such as GSEA, one needs to decide

We next describe the analysis workflow. First, probe sets were mapped to Ensembl Gene IDs and median expression levels for multiple probe sets corresponding to the same gene were calculated. After this step, we were left with 17278 genes and they were tested for enrichment against gene sets generated based on the biological process categories in Gene Ontology. Genes in the human genome were mapped to GO categories according to Ensembl annotation (

Next, we calculated means and standard deviations for each gene at dose 0 for each treatment duration separately and then used these values to standardize all gene expression values. That is, the values for each gene were standardized by subtracting the dose 0 means and dividing by dose 0 standard deviations. The standardized gene expression values then represented the number of standard deviation away from the “normal” gene expression at dose 0.

Finally, we applied the mixed model with fixed effects Dose, Treatment Duration, Dose×Treatment Duration to the gene expression values. Because the data were collected at different times, the variable Batch was also added to adjust for the effects of different batches. In addition, a random Array effect was included in the model to account for correlations of genes from the same array and to facilitate inference to an entire population of arrays, not only to those considered in this study. Contrasts of parameters from this model based orthogonal polynomial coefficients were then used to test for linear trend of expression values over doses and Duration×Linear trend effect. The orthogonal polynomial coefficients are linear transformations of the natural polynomial scores and they alleviate collinearity problems of natural polynomial scores. Adjusted p-values were then computed using the R

Because we were mainly interested in gene sets directly responding to changes in HNE, our analysis focused on gene sets with significant linear trends of expression values corresponding to monotone changes over doses. At the adjusted p-value level of 0.01, we identified 5 and 1 responsive gene sets for 6 h and 24 h treatment, respectively (

tot_p = proportion of genes with treatment effect added to treatment group in gene–set 1; up_p = among treated genes, the proportion of genes for which positive treatment effect mu was added; 1−up = among treated genes, the proportion of genes for which negative treatment effect –mu was added. See text for details of simulation experiment.

When individual tests were conducted with 6 hr treatment samples and 24 hr samples separately, only 5 and 1 gene-sets were significant at 0.01 FDR level. However, when all samples were used, for testing gene-sets with non-significant Duration×Linear Trend interaction, 40 gene-sets were significant at 0.01 FDR level. This shows pooling data with similar trends from both treatment durations improves the statistical power for identifying biologically meaningful gene sets.

On the other hand, the interaction tests were also used to select gene sets showing different response trends for the 6 h and 24 h treatments. Among the 12 gene sets with significant interactions (p-value<0.01), 8 of them were responsive for 6 h treatment (adjusted p-value<0.05) but not for 24 h treatment (adjusted p-value>0.95, see

In this paper, we have proposed linear mixed models for the analysis of microarray data at the pathway-level. This flexible, unified and practical approach can be easily implemented in common statistical software packages. The proposed model makes three main improvements over popular methods for gene-set testing: improved power through testing location shift of gene-set genes, more refined modeling of covariance structure between genes through specification of random effects, and the ability to account for complicated experimental designs through inclusion of design factors and covariate effects.

As suggested by Tian et al.

The use of random effects to account for a general covariance structure that varies according to genes in the proposed models represent our efforts for improving covariance structure modeling of current parametric methods. False positives are likely to result when dependency between genes are not accounted for

On the other hand, the strength of parametric methods such as the proposed mixed models lie in their ability to account for complicated design information. When there are multiple sources of covariation in the data, permutation or resampling methods are often difficult to employ. In contrast, mixed Models 1 and 2 can be easily extended to handle a variety of more complex designs. For example, for two-color arrays and other arrays with multiple measurements per gene on each array, Model 1 can be augmented with additional random effects corresponding to spot or block effects. When arrays are processed in multiple batches, a batch effect can be added to the model to adjust for systematic effects from different batches. Similarly, other random effects from blocks and sites where the experiments were performed can also be incorporated into the models. In the

Average standardized gene expression values for each dose and each treatment duration.

(0.77 MB PDF)

Supplementary table for HNE data.

(0.24 MB XLS)

Dr. Lawrence J. Marnett's group at the Vanderbilt University provided the HNE dose response microarray data. The authors would like to thank the reviewers and editors for helpful suggestions that improved an earlier version of this manuscript.