Self-Contained Statistical Analysis of Gene Sets

Microarrays are a powerful tool for studying differential gene expression. However, lists of many differentially expressed genes are often generated, and unraveling meaningful biological processes from the lists can be challenging. For this reason, investigators have sought to quantify the statistical probability of compiled gene sets rather than individual genes. The gene sets typically are organized around a biological theme or pathway. We compute correlations between different gene set tests and elect to use Fisher’s self-contained method for gene set analysis. We improve Fisher’s differential expression analysis of a gene set by limiting the p-value of an individual gene within the gene set to prevent a small percentage of genes from determining the statistical significance of the entire set. In addition, we also compute dependencies among genes within the set to determine which genes are statistically linked. The method is applied to T-ALL (T-lineage Acute Lymphoblastic Leukemia) to identify differentially expressed gene sets between T-ALL and normal patients and T-ALL and AML (Acute Myeloid Leukemia) patients.


Introduction
Microarrays allow investigators the opportunity to identify individual genes that are differentially expressed. However, a list of single genes often does not provide insight into different biological themes that distinguish the two phenotypes. For this reason, investigators have sought to incorporate gene sets in their analysis. A priori compiled gene sets group individual genes in biologically related sets. Analyzing gene sets rather than individual genes can improve sensitivity and prediction [1]. For example, a gene set may prove to be significant despite the fact that its individual genes may not be significant [2]. Gene sets can be created based on biological function, metabolic pathway or chromosome. Curated databases include KEGG [3], Reactome [4], Gene Ontology (GO) [5], and the Molecular Signatures Database or MSigDB [6], which serves as a repository for human genes and includes databases from KEGG, Reactome, Bio-Carta, and GO.
There are two different approaches when analyzing gene sets. The first type (designated competitive) compares the gene set with its complement when assessing differential expression. Competitive techniques include the Gene Set Enrichment Analysis (GSEA) [2] and the SAFE technique [7]. The second type (designated self-contained) only tests differential expression using the genes within its set.
In the competitive method, the success of a gene set is dependent on the size and nature of its complement. Goeman and Bühlmann [8] advise against the use of competitive methods and Dinu et al. [9] show that GSEA does not properly identify differentially expressed gene sets from a mouse-microarray dataset with simulated genes.
In contrast, self-contained methods, while less popular, only consider those genes within the set for analysis and compute a significance level that is not dependent on genes outside the set. Fridley [10] evaluates a number of self-contained methods. Among them are Stouffer's method [11], which computes a z-value for the set by "averaging" z-values from the K individual genes in the set, and Taylor and Tibshirani (2006) [12], who first order the individual p-values p 1 p 2 . . . p K and use the Tail Strength (TS) statistic, The Kolmogorov-Smirnov (K-S) test [13,14] computes the maximum difference between two distributions which translates into the statistic, Dinu et al. [9] use the L 2 norm of a t-like statistic vector P K i¼1 d 2 i and a permutation method to assess the significance of a gene set in their Significance Analysis of Microarray to Gene-Set analyses (SAM-GS) method. Others include Tomfohr et al. [15] who use a singular value decomposition of expression levels to identify a metagene which is the eigenvector associated with the largest eigenvalue. Activity levels are compared using the t-test.
Kong et al. [16] use Hotelling's T 2 statistic (a multiple variable version of the t-test) to assess the significance of a gene set, where n 1 and n 2 are the sizes of groups 1 and 2, " X ð1Þ k and " X ð2Þ k are the mean vectors of the individual groups, " X ðiÞ k ¼ 1 n i P n i j¼1 X ðiÞ kj , X ð1Þ kj and X ð2Þ kj represent the expression level of gene k for patient j for groups 1 and 2, and S is the pooled covariance matrix is the covariance matrix for group i. Self-contained methods are also used by authors Geoman et al. [17] who base their analysis on a logistic regression model, and Mansmann and Meister [18] who use the Analysis of Covariance (ANCOVA).
We use Fisher's method [19] which follows a chi-squared distribution to perform our selfcontained analysis of gene sets. In addition to having an analytical distribution, Fisher's method has been shown to be asymptotically Bahadur optimal by Pallini [20]. Fisher's method, like most self-contained methods, combines the numerical p-values of all the individual genes in the set to form a consolidated p-value. However, caution must be exercised since a few genes can dominate the statistical behavior of the entire gene set. Therefore, we modify Fisher's method and set a minimum threshold value for individual p-values, thus preventing a few genes from dominating the entire p-value of the gene set. We believe this modification improves the suitability of Fisher's statistic for evaluating the differential expression of gene sets.
Self-contained methods often employ permutation methods [1] to compute a consolidated p-value since individual genes from gene sets cannot be assumed to be independent. In the permutation approach, the patient expression levels are permuted and the unpermuted test statistic (e.g. Fisher's F) is evaluated against the statistics (e.g. permuted F values) generated by the permutations. To account for dependencies among individual genes, we also use a permutation method in conjunction with Fisher's method to evaluate the significance of a gene set.
However, in addition our method evaluates dependencies among pairs of genes during the permutation process and creates a heat map of the dependencies for the gene set. Specifically, we evaluate the probability that gene A is differentially expressed given that gene B is differentially expressed in the arbitrary groups that are created during the permutation process. Thus an investigator not only knows if the gene set is significant but what genes are linked together within the set. A high level of dependency among the genes in a gene set may increase the set's potential to be selected as a differentially expressed set.
We apply the method to identify differentially expressed gene sets when T-ALL (T-lineage Acute Lymphoblastic Leukemia) patients are compared to healthy patients and AML (Acute Myeloid Leukemia) patients using Affymetrix microarray datasets. We use the publicly available Gene Expression Omnibus database (http://www.ncbi.nlm.nih.gov/geo/) accession numbers GSE46170 [21], GSE13204 [22][23][24], and GSE36133 [25] for our analysis. Microarray chip Human Genome U133 Plus 2.0 was used in the databases. Preprocessing and normalization for GSE13204 and GSE36133 are discussed in [23,26] and [25] respectively.
Our paper is organized as follows. We discuss Fisher's method and our modification to Fisher's method in Section 2. Fisher's method is also compared to other self-contained methods using a correlation and power study. Section 3 describes how dependencies in a gene set are accounted for and computed. When many gene sets are tested for significance, there is an increased probability that one may find false positives. We adjust for the multiple tests through the false discovery rate which is discussed in Section 4. Section 5 discusses our results using the Gene Expression Omnibus datasets and Section 6 concludes.

Analyzing gene sets for differential expression using Fisher's method
Given the probability levels p k of K individual genes, Fisher [19] combines the p-values in a set using the expression, When the individual genes are independent, F follows a chi-squared distribution with 2K degrees of freedom from which a consolidated p-value can be determined for the entire set. Table 1 compares pairs of self-contained methods by constructing Pearson's r coefficients. Each entry in the table computes r from 100,000 p-values from two different self-contained methods. The p-values are themselves computed using two simulated gene sets, each composed of K = 20 genes and n = 100 patients generated by sampling from a standard normal distribution (μ = 0, σ = 1). We see that Fisher's method is highly correlated with SAM-GS and Stouffer's method.
One vulnerability of Fisher's method (and other self-contained methods) is that a small subset of genes can conspire to generate a small consolidated p-value for the entire set of K genes. Whitlock [27] notes that Fisher's method is asymmetrically sensitive to small p-values and elects to used a weighted Z-method. Table 2 shows the average p-values of varying gene subsets that will cause the entire set to be significant at a probability level of α = .01. For example, a single gene whose p-value is 3.5 × 10 −6 or less will cause the entire set of 10 genes to be significant at a consolidated p-value of α = .01. We assume the remaining 9 genes (or K-1 genes in general) to have p-values of.5. Similarly, three genes whose p-values are 1.2 × 10 −3 or less will cause the entire set of 20 genes to be significant.
To prevent a few genes from dominating the statistical significance of the entire gene set, we modify Fisher's method. Our adjusted Fisher's test sets a lower limit p min on p-values  [28] also modify Fisher's method, but in contrast, limit the maximum value for p-values to improve the statistical power for rejecting a null hypothesis. Concern for false positives motivates Chai et al. [29] to adjust Fisher's method using Brown's approximation [30]. The probability density function (PDF) of our modified Fisher's method can be constructed for different values of p min in order to properly evaluate the p-value associated with an F value. The PDF of the modified Fisher's method can then be used to determine the minimum number of genes required to make the consolidated p-value of the gene set less than some value α. Specifically, we determine the smallest value K min such that where F α is the value of F for which the area to the right of the PDF of the modified Fisher's method is less than α. Table 3 shows the minimum number of genes required to achieve a gene set significance level of α = .01 for the modified Fisher's method using different levels of p min and different gene set sizes. As p min increases, more genes need to have individual p-values of p min or less. For the unmodified Fisher's method (p min = 0) or chi-squared distribution, only one gene is required. We also note that the proportion of genes required to have p-values of p min or less decreases as the gene set size increases. Fig 3 shows the proportion of genes that need to have individual p-values of p min or less in order for the entire gene set to achieve a significance level of α = .01 for different gene set sizes and levels of p min . One potential concern in using the modified Fisher's method is that the probability density function (PDF) for different combinations of p min and gene set sizes K needs to be constructed. However, in our gene set analysis, we avoid having to compute the PDF because we use the permutation method (as discussed in the next section) to evaluate the p-value of a gene set which is based on a ranking of Fisher's F values. Thus the need to extract the p-value associated with an F value from the modified probability density function is eliminated.
To further evaluate the modified Fisher's method, we compare the power of each self-contained method in Table 4 by creating two gene sets each composed of K = 20 genes and n = 50, n = 100, or n = 200 patients. Expressions levels are created by sampling from a standard normal distribution (σ = 1) with a mean of (μ = 0) for the first set and a mean of (μ = .15) for the second set. Power is determined using 10,000 permutations. As expected, the power increases as the number of patients increases. Fisher's method exhibits equal or slightly higher power compared to other methods and shows little variation in power as p min changes. The higher power of Table 3. Minimum number of genes required to achieve a global gene set significance of α = .01 using the modified Fisher's method at different levels of p min .

Number of genes (K)
p min = .05 p min = .01 p min = .001 p min = 0 Fisher's method is consistent with the results from Fridley et al. [10]. Table 5 compares the fraction of incorrect H 0 rejections (Type I errors) for different self-contained methods. Two gene sets are created, each of which is composed of K = 20 genes and n = 100 patients. Expression levels are created for each set by sampling from a standard normal distribution (μ = 0, σ = 1). Table 5 shows the fraction of the 10,000 genes sets which produce p-values (evaluated using 10,000 permutations) that are less than.05. We see that most methods commit approximately.05 Type I errors. The value of p min has little effect on the fraction of Type I errors committed by Fisher's method.

Accounting for dependencies among genes
Fisher's method assumes the genes in a gene set act independently. However genes in a set are often grouped together because they share a common biological function. Thus gene independence cannot be assumed. To overcome this dilemma, investigators (e.g. [8], [9], [10], [16]) and we compute a distribution by permuting the patient phenotypic labels. The p-value of the gene set is then determined by comparing the rank of the unpermuted statistic relative to the other permutations.  Table 4. Power of different self-contained gene set methods for different patient sizes, K = 20 genes, and 10,000 gene sets. Expressions levels are created by sampling from a standard normal distribution (σ = 1) with a mean of (μ = 0) for the first set and a mean of (μ = .15) for the second set. To assess the difference in p-values for correlated and independent genes, we perform a simulation where 100 gene expression levels of 200 patients are randomly sampled from two normal distributions (100 patients in each group) for different levels of correlation r. The p-value of the gene set is computed with Fisher's method (which assumes the genes are independent and uncorrelated) and with 100,000 permutations (which accounts for the fact that the genes are correlated). Let us denote the former by p uncorr and the latter by p corr . We compute p uncorr by using the cumulative chi-squared distribution and p corr by ranking the Fisher's F values. For the purposes of this numerical experiment, our simulation uses the unmodified Fisher's method, p min = 0. The genes are correlated by constructing and Cholesky factoring a correlation matrix for different levels of correlation r. Subsequently, the ln(p uncorr ) values are plotted on the x-axis and the ln(p corr ) values are plotted on the y-axis at different correlation levels. A linear regression line is then fitted for each correlation level r, r = .05, r = .25, r = .50 and mixed correlation levels (r = .15 + .15sin(2πk 1 k 2 ) where k 1 and k 2 refer to two different genes). We use the equation

Number of Patients
or equivalently to model the relationship between p corr and p uncorr . Eq (5) can be used to fit a least squares regression line through the data. For example, for correlation level r = .05, we calculate m = .27 and b = .40. Fig 4 shows the relationship between the correlated p-value and the uncorrelated p-value. We notice three trends. First, as expected, p corr is higher than p uncorr for r > 0. Second, the ratio p corr /p uncorr increases as the level of correlation (r) increases, which is also not surprising. Finally, the ratio p corr /p uncorr increases as p uncorr decreases. Table 6 further illustrates this third trend by comparing p uncorr , p corr , and the ratio p corr /p uncorr at correlation level r = .05 and calculated values m = .27 and b = . 40. We see that the ratio p corr /p uncorr is only 12 for p uncorr = 10 −2 but 2.5 × 10 8 for p uncorr = 10 −12 . The benefit of fitting a regression line using Eq (5) is that m, the slope of the line, and b, its y-intercept, can be used in Eq (4) to predict the correlated pvalue (p corr ) using its uncorrelated (p uncorr ) value. We find that for the databases encountered in Section 5.2, such an approach is required since the number of permutations needed to compute p corr is prohibitively large. The coefficients m and b in Eq (5) are first extracted from the regression line using a computationally acceptable number of permutations. Then the permutated p-value or p corr is extrapolated using Eq (4). Nonzero values of p min can also be accommodated since the functional relationship between p corr and p uncorr is built during the permutation process.
We also compute dependencies between each pair of genes in the gene set during the permutation process. To accomplish this, we compute the proportion of permutations in which gene A is significant (at some p-value level), gene B is significant, and genes A and B are simultaneously significant. These proportions correspond to the probabilities P(A), P(B) and P(AandB). Since P(AandB) = P(A)P(B|A) and P(AandB) = P(B)P(A|B), the probabilities P(B|A) and P(A|B) can be computed. A heat map of P(A|B)/P(A) for all pairs of genes A and B in a Table 5. Fraction of Type I errors for gene set methods, K = 20 genes, n = 100 patients, 10,000 gene sets. Expressions levels for each gene set are created by sampling from a standard normal distribution (μ = 0, σ = 1).

Fisher p min = 0
Fisher p min = . gene set can be plotted. If P(A|B)/P(A) > 1, the differential expression of P(B) increases the probability that gene A is differentially expressed by factor P(A|B)/P(A). We also note that the factor P(A|B)/P(A) is symmetric, P(A|B)/P(A) = P(B|A)/P(B). We emphasize for clarity that, due to the arbitrary groups created during the permutation process, P(A) and P(B) do not represent the probability that genes A and B are differentially expressed for the original unpermutated groups.

False discovery rate
Since many gene sets are tested in our method, we must account for the increased probability of achieving false positives when using multiple tests. We choose not to use the Family Wise Error Rate (FWER) which reduces the probability that one or more false positives are reported to be less than α since FWER methods suffer from increased Type II errors [31]. Instead, we use the false discovery rate (FDR) method of Benjamini and Hochberg [32]. The false discovery rate is the expected fraction of false positives in the number of reported positives. In the Benjamini and Hockberg (BH) method [32], the p-values of all sets are ordered from smallest to largest. Then the largest index k is found such that p k < ka K . All alternative hypothesis are retained for gene sets i k.

ln(p corr ) vs ln(p uncorr ) at different correlation levels (r) where p corr represents the correlated pvalue and p uncorr represents the uncorrelated p-value.
doi:10.1371/journal.pone.0163918.g004 Self-Contained Statistical Analysis of Gene Sets

Applying the method to T-ALL
Pediatric acute lymphoblastic leukemia (ALL) is the most common childhood cancer and the T-lineage subtype (T-ALL) has a poorer prognosis than B-lineage [33]. Many microarray analyses of T-ALL disease have been done, leading to identification of genes that are involved with the disease [34]. Whole-genome sequencing of twelve early T-cell precursor acute lymphoblastic leukemia (ETP ALL) patients revealed mutations in histone-modifying genes, genes related to cytokine receptor and RAS signalling, and lesions involving haematopoietic development [34]. Despite advances in treatment leading to a high cure rate, there are still significant therapeutic barriers in treating relapse disease [33]. Thus, there is interest in identify genetic signatures of T-ALL relapse disease. In addition, acute lymphoblastic leukemia is a disease where the success of treatment is linked to identifying the leukemia subtype and tailoring the treatment to the subtype [35]. Yeoh et al. [35] identify clusters of genes by using expression profiles of the the top genes for each subgroup. T-ALL is distinguished from other acute lymphoblastic leukemias by the CD3D gene. Subgoups of T-ALL can be determined through the stage of T-cell development and the T-ALL oncogenes: HOX11L2, LYL1 plus LMO2, TAL1 plus LMO1 or LMO2, HOX11, and MLL-ENL (Ferrando and Look [36], Pui et al. [37]). Classification can help determine prognosis based on treatment regimes. For example, MLL-ENL [37] and HOX11 (when treated with combination chemotherapy [36]) have favorable prognoses. HOX11L2 was shown to be a subtype of pediatric T-ALL with poor prognosis (Ballerini et al. [38]). Ferrando et al. [39] link T-ALL genes HOX11, TAL1, and LYL with immunophenotypic expression and stages of thymocyte differentiation. The less favorable prognosis of TAL1 and LYL1 subtypes could be attributed to upregulation of antiapoptotic genes (BCL2A1 or BCL2) [39]. According to Pui et al. [37], "many novel genomic alterations have recently been identified, including focal deletions leading to dysregulated expression of TAL1 and LMO2, deletion and mutation of PTEN, mutations of NOTCH1 and FBXW7, deletions of RB1, duplication of MYB, deletions of RB1, and fusion of SET or ABL1 to NUP214, " confirming that T-ALL is a heterogeneous disease. However, unraveling which genes are the drivers and which are passengers in gene expression analysis can be a challenge [37].
Maiorov et al. [40] use a network-based classification scheme and compare T-ALL patients with normal patients using Gene Expression Omnibus databases GSE13204 and GSE46170. They identify 19 significant subnetworks containing 102 genes and conclude that, "transcription factors, zinc-ion-binding proteins, and tyrosine kinases are the important protein families to trigger T-ALL. " Maiorov et al. assemble the following genes {1. ABL1, 2. CCL5, 3. CD99, 4. TP53, 5. WT1} which have been linked with T-ALL from associated studies and which have been identified in their subnetworks. We calculate the p-values of these genes using the t-test and database GSE13204 to be respectively 3.9 × 10 −31 , 3.0 × 10 −34 , 2.8 × 10 −66 , 1.1 × 10 −8 , and 3.1 × 10 −22 which confirms their significance.

Gene Expression Omnibus Accession Number GSE46170
Using our modified Fisher's method with p min = .001, the Gene Expression Omnibus dataset GSE46170, and the false discovery rate of.0025, we identify the following significant gene sets shown in Table 7 from BioCarta, KEGG, Reactome, and Hallmark which have been downloaded from the MSigDB database [2]. We link only one probe with each gene. The caption of Table 7 includes a description of each gene set from MSigDB [2]. According to [21], "RNA was isolated from the bone marrow samples of childhood T-ALL patients at the time of diagnosis with a blast count over 90% and hybridized to Affymetrix GeneChip HU-133 Plus.2. " Table 7 lists the database associated with each gene set, the number of genes in the set, and the consolidated p-value of each set. The number in parentheses is the number of genes actually found in GSE46170. For each individual gene, 31 T-ALL patients and 7 healthy patients were used to compute a p-value based on the Wilcoxon rank-sum test. A permutation method with 100,000 permutations was used to generate the consolidated p-value of the gene set. (Incidentally, Stouffer's method also identified genes sets tagged with an asterisk ( Ã ) using a false discovery rate of.0025.) We focus our attention on one of the gene sets in Table 7 (REACTOME_PRE_NOTCH_  TRANSCRIPTION_AND_TRANSLATION)  Descriptions of these individual genes can be found at http://software.broadinstitute.org/gsea/msigdb. Genes LOC441488 and LOC728030 are the only genes out of the 29 that were not located in the GSE46170 dataset and not included in the gene set analysis. The gene set, REACTOME_PRE_NOTCH_TRANSCRIP-TION_AND_TRANSLATION, compiles genes involved in "Pre-Notch transcription and translation" [2]. NOTCH1 is a transcription factor involved in "multiple stages of T-cell development" [41]. Mutations in NOTCH1 have been found in over 50% of T-ALL cases [41]. Fig 5 plots the −log 10 of the p-values of the genes in the set using a Wilcoxon rank-sum test. We see that gene 4. E2F3, 9. TNRC6B, 19. NOTCH3, 22. TNRC6C, and 25. TP53 are highly significant. Fig 6 plots P(A | B)/P(A) to show the dependencies among the genes. The multiplicative factor P(A | B)/P(A) is the increased probability that gene A is differentially expressed (at p-value.05 or less) given the differential expression of gene B (at p-value.05 or less). For example, the dark red squares are genes whose probability of being differentially expressed is 5-6 times higher if the gene it is paired with is differentially expressed. We see that the following gene pairs have a positive dependence: gene 3 (E2F1) and gene 24 (TFDP1); gene 7 Table 7. Gene sets and associated p-values that are differentially expressed (T-ALL versus Healthy) using Gene Expression Omnibus Accession GSE46170 and a False Discovery Rate of.0025. Individual genes within each set can be found at software.broadinstitute.org/gsea/msigdb [2]. Individual gene p-values are computed with the Wilcoxon rank-sum test. Gene sets identified with an asterisk (*) were also identified by Stouffer's method. Description of gene sets in Table 7 taken from Subramanian et al. [2]. 1. "Deregulation of CDK5 in Alzheimers Disease" 2. "Genes involved in Pre-NOTCH Transcription and Translation" 3. "Genes involved in Regulation of Complement cascade" 4. "Genes involved in p38MAPK events" 5. "Oxidative Stress Induced Gene Expression Via Nrf2" 6. "Genes involved in Signaling by BMP" 7. "Genes involved in Elevation of cytosolic Ca2+ levels" 8. "Genes up-regulated during formation of blood vessels (angiogenesis)" 9. "Genes involved in Synthesis, Secretion, and Inactivation of Glucose-dependent Insulinotropic Polypeptide (GIP)". Self-Contained Statistical Analysis of Gene Sets

Challenges posed by microarrays GSE13204 and GSE36133
When analyzing microarray databases from Gene Expression Omnibus Accession numbers GSE13204 and GSE36133, we find that many of the individual genes are differentially expressed at low p-values. Fig 7 shows the percent frequency of 20,705 genes as a function of pvalue for GSE13204 when comparing 174 T-ALL patients with 74 normal patients. The p-value of each gene was calculated using the t-test. The first bar plots the percentage of genes whose −log 10 (p) values range from 0 to 2 (or equivalently whose p-values range from 10 −2 to 1), the second bar plots the percentage of genes whose −log 10 (p) values range from 2 to 4 (or equivalently whose p-values range from 10 −2 to 10 −4 ), the third bar plots the percentage of genes whose −log 10 (p) values range from 4 to 6, etc. Thus, over 64% of the genes in GSE13204 have p-values of 10 −2 or lower. Not surprisingly, we find that many of the gene sets are also differentially expressed at very low p-values.
In an attempt to isolate a few biological themes among many differentially expressed gene sets, we decide to select the gene sets with the smallest p-values. We acknowledge that this approach will underreport many of the differentially expressed gene sets. However, due to the large number of sets, it would not be useful to report all the differentially expressed gene sets.
Furthermore, databases GSE13204 and GSE36133 require a prohibitively large number of permutations to differentiate the statistical significance of gene sets since the computed p-value of a gene set cannot be smaller than the reciprocal of the number of permutations used. We find that even with a million permutations, a large percentage of gene sets would share the Self-Contained Statistical Analysis of Gene Sets smallest p-value (1 × 10 −6 ) available. To overcome this problem, we extrapolate the correlated or permutation based p-value by computing coefficients m and b using the linear regression curve Eq (5) using 100,000 permutations. Eq (4) then allows us to extrapolate a very small permutation p-value using a regression line constructed with larger p-values. Tables 8 and 9 show the gene sets (from BioCarta, Kegg, Reactome, and Hallmark) with the highest level of differential expression using datasets GSE13204 and GSE36133 respectively with false discovery rates of 1 × 10 −70 and 3 × 10 −20 . The subset of GSE13204 we use contains 174 T-ALL patients and 74 normal patients. The subset of GSE36133 we use contains 13 T-ALL patients and 33 AML patients. The t-test is used to compute p-values of individual genes for GSE13204 and the Wilcoxon rank-sum test is used to compute p-values of individual genes for GSE36133. The largest permutation p-value of all the gene sets listed in Table 7 from GSE13204 is 2.6 × 10 −73 while the largest permutation p-value of the genes sets listed in Table 9 is 1.7 × 10 −22 . We use the modified Fisher's method with p min = 1 × 10 −8 for GSE13204 and p min = 10 −5 for GSE36133 along with a regression line to generate the permutation p-value for the gene set. Descriptions of the gene sets from Subramanian [2] are included in the table captions. We note that only large gene sets (sets with greater than 142 genes) are selected since smaller gene sets cannot achieve the very small p-values the larger gene sets can attain.
The most highly ranked gene sets in GSE13204 that differentiate T-ALL patients from healthy patients regulate unexpected mechanisms (UV response, hypoxia), signalling mechanisms (nerve growth factor (NGF) and KRAS), cell-matrix adhesions and apical junctions, the cytoskeleton, and endocytosis. RAS signalling and KRAS are identified as mutations in ETP ALL in Zhang et al. [34]. The most highly ranked gene sets in GSE36133 that differentiate T-ALL patients from AML patients regulate transcription factors, mitosis, response to cytokine stimulation, and apoptosis.

Discussion
Fisher's method is a self-contained method used to compute the consolidated p-value of a gene set. We show that Fisher's method has a high level of correlation with many other self-contained methods. We modify Fisher's method to require the differential expression of multiple individual genes in order to trigger the differential expression of the entire gene set. Dependencies among the gene sets can be computed during the permutation process and displayed using a heat map. Our method is applied to study the differential expression of precompiled gene sets from the MSigDB database. We use microarray databases GSE46170, GSE13204, and GSE36133 from the Gene Expression Omnibus to study the differential expression of gene sets for T-ALL vs Healthy patients and T-ALL vs AML patients and display the results in Tables 7,  8 and 9. We find that we need to extrapolate the permutation p-value for databases (GSE13204 and GSE36133) which contain a large percentage of highly differentially expressed genes.
From our gene set analysis, we are able to identify gene sets associated with Pre-NOTCH transcription and translation as well as genes down-regulated by KRAS activation which have been previously associated with T-ALL. We also identify many gene sets that may not have immediate ties to T-ALL in regards to its genetic signature, and which would require additional scrutiny of its individual genes.
We believe our self-contained method is innovative because: it requires the involvement of multiple individual genes; it is capable of displaying dependencies among genes; and it can compute the permutation p-value of highly differentially expressed gene sets. Future efforts would attempt to put large and small gene sets on equal statistical footing, since large gene sets tend to be selected over small gene sets, when a large portion of gene sets are differentially expressed.