Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Intra- and Inter-Individual Variance of Gene Expression in Clinical Studies

  • Wei-Chung Cheng,

    Affiliation Division of Pediatric Neurosurgery, Neurological Institute, Taipei Veterans General Hospital, Taipei, Taiwan

  • Wun-Yi Shu,

    Affiliation Institute of Statistics, National Tsing Hua University, Hsinchu, Taiwan

  • Chia-Yang Li,

    Affiliation The Division of Infectious Diseases, National Health Research Institutes, Miaoli, Taiwan

  • Min-Lung Tsai,

    Affiliation Institute of Athletics, National Taiwan Sport University, Taichung, Taiwan

  • Cheng-Wei Chang,

    Affiliation Department of Biomedical Engineering and Environmental Sciences, National Tsing Hua University, Hsinchu, Taiwan

  • Chaang-Ray Chen,

    Affiliation Department of Biomedical Engineering and Environmental Sciences, National Tsing Hua University, Hsinchu, Taiwan

  • Hung-Tsu Cheng,

    Affiliation Institute of Nanoengineering and Microsystem, National Tsing Hua University Hsinchu, Taiwan

  • Tzu-Hao Wang ,

    knoxtn@cgmh.org.tw (TW); ichsu@mx.nthu.edu.tw (IH)

    Affiliations Genomic Medicine Research Core Laboratory, Chang Gung Memorial Hospital, Taoyuan, Taiwan, Department of Obstetrics and Gynecology, Lin-Kou Medical Center, Chang Gung Memorial Hospital and Chang Gung University, Taoyuan, Taiwan

  • Ian C. Hsu

    knoxtn@cgmh.org.tw (TW); ichsu@mx.nthu.edu.tw (IH)

    Affiliation Department of Biomedical Engineering and Environmental Sciences, National Tsing Hua University, Hsinchu, Taiwan

Intra- and Inter-Individual Variance of Gene Expression in Clinical Studies

  • Wei-Chung Cheng, 
  • Wun-Yi Shu, 
  • Chia-Yang Li, 
  • Min-Lung Tsai, 
  • Cheng-Wei Chang, 
  • Chaang-Ray Chen, 
  • Hung-Tsu Cheng, 
  • Tzu-Hao Wang, 
  • Ian C. Hsu
PLOS
x

Abstract

Background

Variance in microarray studies has been widely discussed as a critical topic on the identification of differentially expressed genes; however, few studies have addressed the influence of estimating variance.

Methodology/Principal Findings

To break intra- and inter-individual variance in clinical studies down to three levels–technical, anatomic, and individual–we designed experiments and algorithms to investigate three forms of variances. As a case study, a group of “inter-individual variable genes” were identified to exemplify the influence of underestimated variance on the statistical and biological aspects in identification of differentially expressed genes. Our results showed that inadequate estimation of variance inevitably led to the inclusion of non-statistically significant genes into those listed as significant, thereby interfering with the correct prediction of biological functions. Applying a higher cutoff value of fold changes in the selection of significant genes reduces/eliminates the effects of underestimated variance.

Conclusions/Significance

Our data demonstrated that correct variance evaluation is critical in selecting significant genes. If the degree of variance is underestimated, “noisy” genes are falsely identified as differentially expressed genes. These genes are the noise associated with biological interpretation, reducing the biological significance of the gene set. Our results also indicate that applying a higher number of fold change as the selection criteria reduces/eliminates the differences between distinct estimations of variance.

Introduction

Over the last decade, microarray studies have had a profound impact on transcriptomic research. One particularly important clinical application of microarray technology is the identification of differentially expressed genes, which may serve as biomarkers for the diagnosis and prognostic prediction of tumors or other complex diseases [1][3]. Despite many successful results, some studies have revealed that gene lists derived from similar studies are highly inconsistent [4][6]. Numerous investigations have been conducted to evaluate the influence of multiple factors, such as batch effects [7], dye effects [8], different platforms [9][13], various experiment designs [14][16], and statistical approaches [17], [18], regarding microarray results. However, few studies have explored the influence of different sources of variation on the identification of differentially expressed genes from microarray analysis.

Researchers have identified two major sources of variance in microarray studies: technical variance and biological variance [19]. All forms of variations influenced by experimental artifacts, such as the quality of RNA, batch effects, and experimental parameters, belong to technical variance. A well-conceived experimental design and execution as well as rigorous statistical analysis can reduce the effects of technical variation. Studies have demonstrated that loop designs are more efficient than reference designs in two color microarrays [14], [20], and many statistical methods can be used to increase the robustness of microarray data analysis [21], [22]. Several studies have concluded that the reproducibility of microarrays could be improved using standardized protocols and carefully designed and controlled experiments [12], [13], [23].

Biological variance is attributed to specimens, rather than procedures, and can be traced to several sources. Anatomic variance is caused by the heterogeneous distribution of cell types within a tissue specimen collected from a single individual [24]. Individual variance is a result of various genotypes and physiological states. For variation in genotypes, copy number variations (CNVs) [25], [26] and allele variations [27], [28] have been shown to influence gene expression levels. Physiological status such as environment factors, disease state, and other variables influence gene expression. Many researchers have reported biological variance in human blood [29], [30], lung [31], placenta [32], retina [33], and other tissues [34][37]. In addition, variations in gene expression have been identified among individuals as well as populations [38][40] and species [19], [40], [41]. However, the effects of applying different levels of variances have not been well addressed.

In this study, we used the normal human placenta as a model to evaluate technical, anatomic, and individual variance. Each of these types of variation should be considered in clinical studies. The “inter-individual variable gene” was used as an example to evaluate the influence of estimating variance on microarray results. We profiled three levels of variance in human clinical studies and addressed the importance of estimating variance on the statistical and biological aspect for microarray studies. Our data demonstrated that correct variance evaluation is critical in selecting significant genes.

Materials and Methods

Specimen Collection and Processing

Eleven normal placental tissues were obtained from 9 healthy individuals who underwent cesarean section without labor pain [42]. This study was approved by the Institutional Review Board of Chang Gung Memorial Hospital (IRB#96-0630B). Inclusion criteria were healthy normotensive term pregnancies with appropriate-for-gestational-age fetuses, who displayed no abnormality on routine ultrasound scans. Exclusion criteria for this study were fetal chromosomal abnormalities, pre- and postnatal malformations or phenotypic anomalies, maternal smoking, maternal obesity, and maternal diseases, such as autoimmune diseases, thrombophilic conditions, and diabetes [43]. The clinical information is summarized in Table 1. Placental specimens were obtained from the same region of the placenta (5 cm away from the site of cord insertion) immediately after delivery. The approximate 2.5-cm thickness of the placental cross section was divided into three equal parts: maternal (includes thin basal plate), middle, and fetal (includes the chorionic plate) [32]. We analyzed the middle part of the placental tissues in all of our placental studies [42], [44]. The tissues were snap frozen in liquid nitrogen and stored at −80°C. The first sample group (G1) comprised samples 1 to 9 of 9 individuals. The second sample group (G2) contained 8–1, 8–2, and 8–3, which were 3 different placental tissues taken from the same individual. The third sample group (G3) consisted of 2 technical replicates, 8–3_1 and 8–3_2, using the identical RNA pool (Figure 1).

thumbnail
Figure 1. Microarray experimental design.

Three kinds of samples were employed in this study. Individual variance was evaluated using the first sample group (G1), comprising Samples 1 to 9 of nine individuals. The second sample group (G2) was used to evaluate anatomic variance. It contained Samples 8–1, 8–2, and 8–3, taken from three different sections of placenta from the same individual. The third sample group (G3) consists of two technical replicates, Samples 8–3_1 and 8–3_2, using an identical RNA pool for microarray hybridization to evaluate technical variance. The expression of Sample 8–3 could be estimated by the mean expression of Samples 8–3_1 and 8–3_2. The mean expression of Samples 8–1, 8–2, and 8–3 represented the expression of Sample 8.

https://doi.org/10.1371/journal.pone.0038650.g001

RNA Extraction and Microarray Hybridization

Total RNA was isolated as previously reported [45]. Because the purpose of this study was to analyze variance of gene expression that may be commonly encountered at the tissue level, we did not isolate individual cell types from whole tissues. During RNA extraction, 1 ml of Trizol reagent (Life Technologies, Rockville, MD) was added to every 50–100 mg of pulverized frozen placental tissue. Total RNA was isolated using the Trizol reagent (Life Technologies, Rockville, MD). Total RNA was quantified by UV absorption at 260 nm, and RNA quality was examined using the Agilent 2100 bioanalyzer (Agilent technologies, USA). cDNA labeling was conducted using a 3 DNA Array 50™ kit (Genisphere, Hatfield, PA), according to the manufacturer’s protocols. In brief, 20-µg total RNA was used to perform reverse transcription reaction with SuperScript II RNase H- reverse transcriptase and specific primers (Invitrogen life technologies, USA). All synthesized tagged cDNA targets were then purified using the Microcon YM-30 column (Millipore, USA). The purified targets and fluorescent 3 DNA reagents were hybridized to the arrays in succession. Arrays were sealed in a homemade hybridization chamber that adapted the design provided in M-Guide (Patrick O. Brown laboratory, Stanford University, USA). Hybridization was performed at 65°C in a water bath for 16 h, and arrays were washed according to the manufacturer’s protocol (http://www.genisphere.com/pdf/array50v2_10_19_04.pdf). Subsequently, arrays were scanned with GenePix 4100A (Axon Instruments, USA) and images were acquired using GenePix Pro 5.0 software (Axon Instruments, USA).

Production of Microarrays

We originally ordered 9600 human cDNA clones of the IMAGE library from Incyte Genomics (Palo Alto, Calif, USA) and allowed sequencing at that location. Only 7334 clones passed sequence verification by Incyte Genomics and were shipped to us. Therefore, every clone of this 7334-clone cDNA library had an IMAGE ID, DNA sequences, vector names, and information for PCR primers [45]. All clones were further amplified by PCR and purified by isopropanol precipitation in 96-well plates. The purified DNAs were resuspended in 3×SSC for spotting. A single microarray slide (CMT-GAPsII, Corning Inc., USA) contains 7334 human cDNA probes in quadruplicate, 10 spike-in genes (SpotReportTM-10 Array Validation System, Stratagene, USA), and one housekeeping gene, β-actin, in 96 replicates. Each array had 32,448 spots. The arrays were post-processed as recommended in the Corning UltraGAPS Coated Slides Instruction Manual. Microarray slides were produced in a well-controlled environment (28±2°C and 48±1% humidity) and stored under desiccation until use. The array system was assembled according to M-Guide (Patrick O. Brown laboratory, Stanford University, USA) and controlled using ArrayMaker, version 2.5.1 (Joseph DeRisi laboratory, UCSA, USA) [46]. A rigorous system commissioning was performed to guarantee the quality of the printed arrays. Before hybridization, the slides were preprocessed according to the instruction manual for the Corning UltraGAPS Coated Slides, including rehydration, snap-dry, UV-crosslinking, baking, and surface blocking. DNAs were UV-crosslinked with 300 mJ/cm2 using the Stratalinker 2400 UV Crosslinker (Stratagene, USA).

Microarray Data Analysis

The logarithm of the ratios for all valid spots on each array was normalized by locally weighted linear regression (LOWESS). Descriptions of Microarray Data Preprocessing can be found in our previous studies [47]. The normalized log ratios were then processed gene-by-gene using a log linear model [47], [48]. This model describes the normalized log ratio as follows:where γ represents the relative labeling efficiency between dyes, λi is log2 (expression of sample i/mean expression of all samples) for one specific cDNA clone, with , and ε is the random error with mean 0 and variance σ2. σ represents the estimated variance for one specific cDNA clone. For each clone, λi and σ are estimated from the observed data by using the least squares method as and . When the data had been processed using the log linear model, 5501 genes could be calculated in the model without singularity. is estimated by . is estimated by . A further description of the statistical model can be found in Methods S1. We had developed a Web tool for loop-design microarray data analysis [49]. All of the front-end analyses of our microarray data were conducted using this public available Web tool. The microarray data of this work are MIAME compliant and have been deposited in the GEO of NCBI (accession number: GSE27646).

Differential Expression and Averaged Fold Change

Differential expression is log2 (fold change of 2 samples) for one specific cDNA clone and is denoted as , where x is the index denoting clones and i,j denoting samples. Differential expression profiles in Figure 2a are the histograms of data set S1:, S2:, and S3:, which are the set of all when x runs over all clones and (i,j) runs over all possible pairs in G1, G2, and G3, respectively. For S1, i and j range from 1 to 9. For S2, i and j range from 8–1 to 8–3. For S3, i and j are 8–3_1 and 8–3_2, respectively. Moreover, averaged fold change is estimated by.

thumbnail
Figure 2. Profiles of the three kinds of variance.

(a) The distribution of the differential expression for the three forms of variance. The differential expression for the three forms of variance was estimated by S1:, S2:, and S3: for any possible pair of i and j, respectively. (b) D1, D2, and D3 are the probability density distributions of D quantity using permutation method using the data series S1, S2, and S3 when considering individual, anatomic, and technical variance respectively.

https://doi.org/10.1371/journal.pone.0038650.g002

where denotes the mean over absolute expression differences of all possible sample pairs (i,j) for clone x. It is the indicator of fold change for individual variance.

Statistical Test

We designed a test statistic,

to describe the variation of gene expression between samples. The summation runs on every dual-color microarray experiment (represented by an arrow in Figure 1), where x is the xth clone, i is for the sample represented by the tail of the arrow, j is for the sample represented by the head of the arrow, and n is the number of sample pair i,j. We used the sampling permutation method to describe the D quantity when considering three levels of variance (Methods S1). D1, D2, and D3 are the results of 10 million times the sampling permutation of and , for taking n data from S1, S2, and S3 at one time. The corresponding p values of the D quantity are determined using the smoothed curve of the probability density in Figure 2b. The criterion of the p value for the statistical test in this study is a false discovery rate (FDR) of 5%.

Functional Enrichment Analysis

Gene Ontology (GO)-based functional enrichment analysis is used to measure gene enrichment in annotation terms for the inter-individual variable genes. The significance score in Table 1 is –log (EASE Score), where the EASE Score is a modified Fisher exact p value [50] obtained by DAVID. The GO terms passed the criteria, EASE Score <0.1, and at least 2 genes in each GO term are considered for further comparison. Only 11 mutual GO terms exist for all selection criteria, and these are shown in Table 2.

thumbnail
Table 2. Significant score of Gene Ontology terms for the significant gene sets determined by distinct significant criteria.

https://doi.org/10.1371/journal.pone.0038650.t002

Results

Demographics of Studied Subjects

Analyzed placental tissues were collected from 9 healthy pregnant women, whose clinical information is listed in Table 1. All the pregnant women were free of hypertension, diabetes mellitus, preterm labor, and other medical diseases. All neonates were born at term and with normal body weight and healthy vital signs that were evaluated with Apgar scores at 1 min and 5 min after delivery, as used previously [42][44].

The Profiles of 3 Levels of Variance

We used a loop design in a microarray analysis of normal placental tissues to investigate technical, anatomic, and individual variance in microarray data. Figure 1 is a schematic representation of the interwoven loop hybridization design performed in this study. We selected 11 normal placental tissues from 9 women with term pregnancies, who underwent Cesarean section prior to the onset of labor, to avoid variations caused by labor pain. Microarray data were obtained from 3 sample groups to estimate individual, anatomic, and technical variance. The first sample group (G1) comprised Samples 1 to 9, samples of 9 individuals. The second sample group (G2) contained Sample 8–1, 8–2, and 8–3, which were 3 different placental regions taken from the same individual. The third sample group (G3) consisted of 2 technical replicates, Sample 8–3_1 and 8–3_2, obtained from the same RNA pool. Differential expression profiles in Figure 2a are log (fold change) between samples in 3 sample groups (G1, G2, and G3) and it is the histogram of data series S1, S2, and S3, respectively. These results were presented as distributions of the fold changes of G1, G2, and G3. The results indicate a progressive narrowing of distribution curves from S1 to S3, revealing that individual difference produced a greater degree of relative variability in gene expression than that of the anatomic or technical difference.

A test statistic, D quantity, was designed to measure the variation in gene expression between samples. Figure 2b shows the probability density profiles of the D quantity, D1, D2, and D3, representing 3 levels of variability. These profiles were generated by applying permutation methods using the data series S1, S2, and S3, indicating extreme differences in the 3 levels of variance.

Case Study: Inter-individual Variable Gene

In this study, inter-individual variable genes, of which the expression varies highly between individuals, were used to evaluate the importance of estimating variance. When defining inter-individual variable genes according to D quantity, variations in gene expression were set at a level exceeding that of anatomic variance. Therefore, when anatomic variance was considered in the significance test, Pa is the p value of the D quantity determined the D2 curve in Figure 2b. When anatomic variance is not considered in the experimental design, technical variance, evaluated by technical replication, is commonly used for the significance test. Pt is the p value of the D quantity determined by technical variance (D3 curve in Figure 2b).

Figure 3a plots averaged fold change versus 2 corresponding p values (Pa and Pt) for each gene. When FDR 5% was set as significant, 2 groups of significant genes were obtained. The 2 corresponding cutoff p values are indicated by red arrows in Figure 3b. Averaged fold change was used as another criterion to select inter-individual variable genes. In this study, the 4 averaged fold changes, from 1.2 to 1.5 (the gray arrows in Figure 3b), served as further criteria for the identification of inter-individual variable genes.

thumbnail
Figure 3. The scatter plot of averaged fold change and p values, and the selection of inter-individual variable gene.

(a) The scatter plot of log2 (averaged fold change) and –log (p value). Pa is the p value determined by applying anatomic variance. Pt is the p value determined by applying technical variance. (b) The enlarged area of the rectangle in (a). The red arrows indicate the corresponding p value of FDR 5%. The gray arrows indicate the averaged fold change criteria: 1.2, 1.3, 1.4, and 1.5. (c) The number of inter-individual variable gene selected by the criteria of FDR 5%, evaluated by technical and anatomic variance (The red arrows in Figure 3b), and distinct averaged fold changes (The gray arrows in Figure 3b).

https://doi.org/10.1371/journal.pone.0038650.g003

We investigated sets of inter-individual variable genes generated according to distinct selection criteria (different averaged fold changes and corresponding p values) to evaluate the effects of differing levels of variance. Figure 3c shows the number of significant genes identified using 2 variance criteria, Pt and Pa (the red arrows in Figure 3b), with different averaged fold changes (the gray arrows in Figure 3b). When a higher averaged fold change was used, the influence of variance underestimation decreased, as shown by the number of significant genes (Figure 3c), but it paid by reducing the number of selected genes. The difference was eliminated when the cutoff value of averaged fold change was set to greater than 1.3.

To evaluate the influence of variance underestimation on biological prediction, the gene lists identified using the criteria in Figure 3c underwent functional enrichment analysis for gene ontology (GO) using DAVID bioinformatics resources 6.7 [50]. Among all significant genes listed in Figure 3c, only 11 common GO terms were identified. Table 2 shows enrichment analysis results of the 11 GO terms for the significant genes listed when applying anatomic and technical variance with the averaged fold change criteria 1.2 and 1.3. The enrichment results of averaged fold change set at 1.4 and 1.5 were not listed because 2 significant gene lists based on anatomic and technical variance were the same. A significance score was defined as -log (p value), where the p value represented the significance of each GO term, according to a modified Fisher exact test in DAVID bioinformatics resources 6.7. Hence, a higher significance score represents a higher significance for the result.

For the same GO term, the significance score for the gene set, the p value of which was deduced by applying anatomic variance, was usually higher than that defined by technical variance (Table 2). This suggests that the lists of significant genes based on technical variance might include “noisy” genes, which reduced the significance of the GO terms.

Discussion

Even as simple as a single cell, its physiology are governed by various networks, each comprising multiple signaling gene products, which interact through positive and negative feedbacks, as we showed previously [51]. Complexity theory, also known as chaos theory (http://en.wikipedia.org/wiki/chaos_theory), has been developed (http://sbs-xnet.sbs.ox.ac.uk/complexity/complexity_home.asp) to better describe the emergent phenomenon of the cell. Clinical studies investigating the clinical outcomes of individuals [52] often derive results full of noise, which can be further grouped into intra- and inter-individual variance. Therefore, devising analytical approaches to dissect these confounding factors is critical.

In this study, we first collected placental tissues only from carefully selected healthy term pregnancies, avoiding any potential effects from maternal or fetal diseases. For a single organ, different regions may have distinctly specialized functions, leading to variations in gene expression [31], [32]. However, this type of variation differs between organs. The anatomic variance identified in this study was the heterogeneous distribution of cell types within a tissue specimen [53], prevalent in general clinical studies. Therefore, all tissues in this study were obtained from the same regions and same layer of the placenta to avoid biological variance among different regions of the placenta [32]. We did not isolate fetal trophoblasts from maternal endothelial cells in each placental tissue because we attempted to analyze the intra- and inter-individual variance directly from clinical tissues. To achieve this goal, we used a loop-designed method to increase the statistical power of microarray data analysis.

We used a test statistic, D quantity, in this study to describe variations in gene expression between samples. The permutation method was employed to describe the characteristics of the 3 levels of variability. Permutation analysis is frequently adopted for microarray studies [54][59] because distributional assumptions (e.g., normal) using microarray data are often questionable [54]. A non-parametric approach considering factors such as non-uniform distributions could exhibit the characteristics of data more appropriately. The profiles shown in Figure 2 illustrate the differences in the 3 levels of variability, demonstrating that the evaluation of the correct variance must be considered in the experimental design to define statistically significant genes.

For the selection of significant genes, the results of phase I of the MicroArray Quality Control (MAQC) project suggest that the inter-platform reproducibility of enriched KEGG pathways and GO terms was markedly increased when fold-change ranking in addition to a non-stringent p value cutoff were used as the selection criteria [60]. Thus, we used a non-stringent p value, FDR 5%, with averaged fold change as the selection criteria. However, the relationship between the stringency of fold change and biological significance remains controversial. We compared the use of 4 averaged fold changes as criteria to identify the common GO terms of all selection criteria. Pan et al. suggested that the robustness of biological conclusions derived from microarray analysis should be routinely assessed by examining the validity of the conclusions using a range of threshold parameters [61]. Hence, common GO terms are representative functions for inter-individual variable genes. In this manner, the influence of variance underestimation could be evaluated by using the significant scores of the common GO terms. The significant scores of the canonical pathways had been used to access distinct selection criteria [62].

The identification of inter-individual variable genes through different variance levels demonstrates the importance of estimating variance from the statistical and biological viewpoints. From the statistical aspect, the impact of variance underestimation includes non-statistically significant genes in the gene list (Figure 3c). From the biological aspect, significant scores of GO terms were used to evaluate the gene sets from distinct criteria. Table 2 shows a summary of biological evidence for evaluating gene sets with different significance criteria. It also shows that significant gene sets with accurate evaluation of variance provided more accurate biological interpretations. Our results also suggest that applying a higher cutoff point of fold change reduced, or even eliminated, the influence of variance underestimation. This may be a solution to overcome the difficulties associated with the identification of significant genes when the estimation of precise variance has not been considered adequately in the experimental design, although it paid by reducing the number of the final gene list.

This study demonstrated the importance of estimating variance. Different types of biological variance should be considered, depending on the objectives of a particular study. For example, when using tumor and normal tissues collected from the same individual to study the signature of a cancer [63], anatomic variance should be considered. In clinical studies seeking to identify biomarkers for cancer classification, in which the subject of the experiment is of the same race, individual variance should be considered. When experimental subjects of clinical studies include individuals from different races, inter-population variance should be considered. Different sampling contributes different levels of variance, and such factors should be considered in the experimental design and statistical model. Our results indicate that “noisy” genes are falsely identified as differentially expressed genes when the level of variance is underestimated, and applying a higher fold change as the selection criterion reduces/eliminates the differences between distinct estimations of variance.

Supporting Information

Methods S1.

The detail description of the statistic model and sampling permutation method.

https://doi.org/10.1371/journal.pone.0038650.s001

(DOC)

Author Contributions

Conceived and designed the experiments: WC WS. Performed the experiments: WC CL MT. Analyzed the data: WC CWC CRC. Contributed reagents/materials/analysis tools: HC TW IH. Wrote the paper: WC IH.

References

  1. 1. van ‘t Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AA, et al. (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature 415: 530–536.LJ van ‘t VeerH. DaiMJ van de VijverYD HeAA Hart2002Gene expression profiling predicts clinical outcome of breast cancer.Nature415530536
  2. 2. Hoshida Y, Villanueva A, Kobayashi M, Peix J, Chiang DY, et al. (2008) Gene expression in fixed tissues and outcome in hepatocellular carcinoma. N Engl J Med 359: 1995–2004.Y. HoshidaA. VillanuevaM. KobayashiJ. PeixDY Chiang2008Gene expression in fixed tissues and outcome in hepatocellular carcinoma.N Engl J Med35919952004
  3. 3. Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, et al. (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403: 503–511.AA AlizadehMB EisenRE DavisC. MaIS Lossos2000Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling.Nature403503511
  4. 4. Ein-Dor L, Kela I, Getz G, Givol D, Domany E (2005) Outcome signature genes in breast cancer: is there a unique set? Bioinformatics 21: 171–178.L. Ein-DorI. KelaG. GetzD. GivolE. Domany2005Outcome signature genes in breast cancer: is there a unique set?Bioinformatics21171178
  5. 5. Michiels S, Koscielny S, Hill C (2005) Prediction of cancer outcome with microarrays: a multiple random validation strategy. Lancet 365: 488–492.S. MichielsS. KoscielnyC. Hill2005Prediction of cancer outcome with microarrays: a multiple random validation strategy.Lancet365488492
  6. 6. Tan PK, Downey TJ, Spitznagel EL, Xu P, Fu D, et al. (2003) Evaluation of gene expression measurements from commercial microarray platforms. Nucleic Acids Res 31: 5676–5684.PK TanTJ DowneyEL SpitznagelP. XuD. Fu2003Evaluation of gene expression measurements from commercial microarray platforms.Nucleic Acids Res3156765684
  7. 7. Leek JT, Scharpf RB, Bravo HC, Simcha D, Langmead B, et al. (2010) Tackling the widespread and critical impact of batch effects in high-throughput data. Nat Rev Genet 11: 733–739.JT LeekRB ScharpfHC BravoD. SimchaB. Langmead2010Tackling the widespread and critical impact of batch effects in high-throughput data.Nat Rev Genet11733739
  8. 8. Liang M, Briggs AG, Rute E, Greene AS, Cowley AW (2003) Quantitative assessment of the importance of dye switching and biological replication in cDNA microarray studies. Physiol Genomics 14: 199–207.M. LiangAG BriggsE. RuteAS GreeneAW Cowley2003Quantitative assessment of the importance of dye switching and biological replication in cDNA microarray studies.Physiol Genomics14199207
  9. 9. Severgnini M, Bicciato S, Mangano E, Scarlatti F, Mezzelani A, et al. (2006) Strategies for comparing gene expression profiles from different microarray platforms: application to a case-control experiment. Anal Biochem 353: 43–56.M. SevergniniS. BicciatoE. ManganoF. ScarlattiA. Mezzelani2006Strategies for comparing gene expression profiles from different microarray platforms: application to a case-control experiment.Anal Biochem3534356
  10. 10. Patterson TA, Lobenhofer EK, Fulmer-Smentek SB, Collins PJ, Chu TM, et al. (2006) Performance comparison of one-color and two-color platforms within the MicroArray Quality Control (MAQC) project. Nat Biotechnol 24: 1140–1150.TA PattersonEK LobenhoferSB Fulmer-SmentekPJ CollinsTM Chu2006Performance comparison of one-color and two-color platforms within the MicroArray Quality Control (MAQC) project.Nat Biotechnol2411401150
  11. 11. Shi L, Reid LH, Jones WD, Shippy R, Warrington JA, et al. (2006) The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat Biotechnol 24: 1151–1161.L. ShiLH ReidWD JonesR. ShippyJA Warrington2006The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements.Nat Biotechnol2411511161
  12. 12. Irizarry RA, Warren D, Spencer F, Kim IF, Biswal S, et al. (2005) Multiple-laboratory comparison of microarray platforms. Nat Methods 2: 345–350.RA IrizarryD. WarrenF. SpencerIF KimS. Biswal2005Multiple-laboratory comparison of microarray platforms.Nat Methods2345350
  13. 13. Larkin JE, Frank BC, Gavras H, Sultana R, Quackenbush J (2005) Independence and reproducibility across microarray platforms. Nat Methods 2: 337–344.JE LarkinBC FrankH. GavrasR. SultanaJ. Quackenbush2005Independence and reproducibility across microarray platforms.Nat Methods2337344
  14. 14. Vinciotti V, Khanin R, D’Alimonte D, Liu X, Cattini N, et al. (2005) An experimental evaluation of a loop versus a reference design for two-channel microarrays. Bioinformatics 21: 492–501.V. VinciottiR. KhaninD. D’AlimonteX. LiuN. Cattini2005An experimental evaluation of a loop versus a reference design for two-channel microarrays.Bioinformatics21492501
  15. 15. Kerr MK (2003) Design considerations for efficient and effective microarray studies. Biometrics 59: 822–828.MK Kerr2003Design considerations for efficient and effective microarray studies.Biometrics59822828
  16. 16. Kerr MK, Churchill GA (2001) Experimental design for gene expression microarrays. Biostatistics 2: 183–201.MK KerrGA Churchill2001Experimental design for gene expression microarrays.Biostatistics2183201
  17. 17. Jeffery IB, Higgins DG, Culhane AC (2006) Comparison and evaluation of methods for generating differentially expressed gene lists from microarray data. BMC Bioinformatics 7: 359.IB JefferyDG HigginsAC Culhane2006Comparison and evaluation of methods for generating differentially expressed gene lists from microarray data.BMC Bioinformatics7359
  18. 18. Jeanmougin M, de Reynies A, Marisa L, Paccard C, Nuel G, et al. (2010) Should we abandon the t-test in the analysis of gene expression microarray data: a comparison of variance modeling strategies. PLoS ONE 5: e12336.M. JeanmouginA. de ReyniesL. MarisaC. PaccardG. Nuel2010Should we abandon the t-test in the analysis of gene expression microarray data: a comparison of variance modeling strategies.PLoS ONE5e12336
  19. 19. Whitehead A, Crawford DL (2006) Variation within and among species in gene expression: raw material for evolution. Mol Ecol 15: 1197–1211.A. WhiteheadDL Crawford2006Variation within and among species in gene expression: raw material for evolution.Mol Ecol1511971211
  20. 20. Kerr MK, Churchill GA (2007) Statistical design and the analysis of gene expression microarray data. Genet Res 89: 509–514.MK KerrGA Churchill2007Statistical design and the analysis of gene expression microarray data.Genet Res89509514
  21. 21. Manoli T, Gretz N, Grone HJ, Kenzelmann M, Eils R, et al. (2006) Group testing for pathway analysis improves comparability of different microarray datasets. Bioinformatics 22: 2500–2506.T. ManoliN. GretzHJ GroneM. KenzelmannR. Eils2006Group testing for pathway analysis improves comparability of different microarray datasets.Bioinformatics2225002506
  22. 22. Jung SH (2010) Sample size and power calculation for molecular biology studies. Methods Mol Biol 620: 203–218.SH Jung2010Sample size and power calculation for molecular biology studies.Methods Mol Biol620203218
  23. 23. Bammler T, Beyer RP, Bhattacharya S, Boorman GA, Boyles A, et al. (2005) Standardizing global gene expression analysis between laboratories and across platforms. Nat Methods 2: 351–356.T. BammlerRP BeyerS. BhattacharyaGA BoormanA. Boyles2005Standardizing global gene expression analysis between laboratories and across platforms.Nat Methods2351356
  24. 24. van Beek EA, Bakker AH, Kruyt PM, Hofker MH, Saris WH, et al. (2006) Intra- and interindividual variation in gene expression in human adipose tissue. Pflugers Arch. EA van BeekAH BakkerPM KruytMH HofkerWH Saris2006Intra- and interindividual variation in gene expression in human adipose tissue.Pflugers Arch
  25. 25. Hollox EJ, Armour JA, Barber JC (2003) Extensive normal copy number variation of a beta-defensin antimicrobial-gene cluster. Am J Hum Genet 73: 591–600.EJ HolloxJA ArmourJC Barber2003Extensive normal copy number variation of a beta-defensin antimicrobial-gene cluster.Am J Hum Genet73591600
  26. 26. Heidenblad M, Lindgren D, Veltman JA, Jonson T, Mahlamaki EH, et al. (2005) Microarray analyses reveal strong influence of DNA copy number alterations on the transcriptional patterns in pancreatic cancer: implications for the interpretation of genomic amplifications. Oncogene 24: 1794–1801.M. HeidenbladD. LindgrenJA VeltmanT. JonsonEH Mahlamaki2005Microarray analyses reveal strong influence of DNA copy number alterations on the transcriptional patterns in pancreatic cancer: implications for the interpretation of genomic amplifications.Oncogene2417941801
  27. 27. Yan H, Yuan W, Velculescu VE, Vogelstein B, Kinzler KW (2002) Allelic variation in human gene expression. Science 297: 1143.H. YanW. YuanVE VelculescuB. VogelsteinKW Kinzler2002Allelic variation in human gene expression.Science2971143
  28. 28. Bray NJ, Buckland PR, Owen MJ, O’Donovan MC (2003) Cis-acting variation in the expression of a high proportion of genes in human brain. Hum Genet 113: 149–153.NJ BrayPR BucklandMJ OwenMC O’Donovan2003Cis-acting variation in the expression of a high proportion of genes in human brain.Hum Genet113149153
  29. 29. Cheung VG, Conlin LK, Weber TM, Arcaro M, Jen KY, et al. (2003) Natural variation in human gene expression assessed in lymphoblastoid cells. Nat Genet 33: 422–425.VG CheungLK ConlinTM WeberM. ArcaroKY Jen2003Natural variation in human gene expression assessed in lymphoblastoid cells.Nat Genet33422425
  30. 30. Whitney AR, Diehn M, Popper SJ, Alizadeh AA, Boldrick JC, et al. (2003) Individuality and variation in gene expression patterns in human blood. Proc Natl Acad Sci U S A 100: 1896–1901.AR WhitneyM. DiehnSJ PopperAA AlizadehJC Boldrick2003Individuality and variation in gene expression patterns in human blood.Proc Natl Acad Sci U S A10018961901
  31. 31. Gruber MP, Coldren CD, Woolum MD, Cosgrove GP, Zeng C, et al. (2006) Human lung project: evaluating variance of gene expression in the human lung. Am J Respir Cell Mol Biol 35: 65–71.MP GruberCD ColdrenMD WoolumGP CosgroveC. Zeng2006Human lung project: evaluating variance of gene expression in the human lung.Am J Respir Cell Mol Biol356571
  32. 32. Sood R, Zehnder JL, Druzin ML, Brown PO (2006) Gene expression patterns in human placenta. Proc Natl Acad Sci U S A 103: 5478–5483.R. SoodJL ZehnderML DruzinPO Brown2006Gene expression patterns in human placenta.Proc Natl Acad Sci U S A10354785483
  33. 33. Chowers I, Liu D, Farkas RH, Gunatilaka TL, Hackam AS, et al. (2003) Gene expression variation in the adult human retina. Hum Mol Genet 12: 2881–2893.I. ChowersD. LiuRH FarkasTL GunatilakaAS Hackam2003Gene expression variation in the adult human retina.Hum Mol Genet1228812893
  34. 34. Oleksiak MF, Churchill GA, Crawford DL (2002) Variation in gene expression within and among natural populations. Nat Genet 32: 261–266.MF OleksiakGA ChurchillDL Crawford2002Variation in gene expression within and among natural populations.Nat Genet32261266
  35. 35. Pritchard CC, Hsu L, Delrow J, Nelson PS (2001) Project normal: defining normal variance in mouse gene expression. Proc Natl Acad Sci U S A 98: 13266–13271.CC PritchardL. HsuJ. DelrowPS Nelson2001Project normal: defining normal variance in mouse gene expression.Proc Natl Acad Sci U S A981326613271
  36. 36. Schadt EE, Monks SA, Drake TA, Lusis AJ, Che N, et al. (2003) Genetics of gene expression surveyed in maize, mouse and man. Nature 422: 297–302.EE SchadtSA MonksTA DrakeAJ LusisN. Che2003Genetics of gene expression surveyed in maize, mouse and man.Nature422297302
  37. 37. Jin W, Riley RM, Wolfinger RD, White KP, Passador-Gurgel G, et al. (2001) The contributions of sex, genotype and age to transcriptional variance in Drosophila melanogaster. Nat Genet 29: 389–395.W. JinRM RileyRD WolfingerKP WhiteG. Passador-Gurgel2001The contributions of sex, genotype and age to transcriptional variance in Drosophila melanogaster.Nat Genet29389395
  38. 38. Li J, Liu Y, Kim T, Min R, Zhang Z (2010) Gene expression variability within and between human populations and implications toward disease susceptibility. PLoS Comput Biol 6. J. LiY. LiuT. KimR. MinZ. Zhang2010Gene expression variability within and between human populations and implications toward disease susceptibility.PLoS Comput Biol 6
  39. 39. Whitehead A, Crawford DL (2006) Neutral and adaptive variation in gene expression. Proc Natl Acad Sci U S A 103: 5425–5430.A. WhiteheadDL Crawford2006Neutral and adaptive variation in gene expression.Proc Natl Acad Sci U S A10354255430
  40. 40. Stevens VM, Pavoine S, Baguette M (2010) Variation within and between closely related species uncovers high intra-specific variability in dispersal. PLoS One 5: e11123.VM StevensS. PavoineM. Baguette2010Variation within and between closely related species uncovers high intra-specific variability in dispersal.PLoS One5e11123
  41. 41. Kliebenstein DJ (2008) A role for gene duplication and natural variation of gene expression in the evolution of metabolism. PLoS One 3: e1838.DJ Kliebenstein2008A role for gene duplication and natural variation of gene expression in the evolution of metabolism.PLoS One3e1838
  42. 42. Peng HH, Kao CC, Chang SD, Chao AS, Chang YL, et al. (2011) The effects of labor on differential gene expression in parturient women, placentas, and fetuses at term pregnancy. Kaohsiung J Med Sci 27: 494–502.HH PengCC KaoSD ChangAS ChaoYL Chang2011The effects of labor on differential gene expression in parturient women, placentas, and fetuses at term pregnancy.Kaohsiung J Med Sci27494502
  43. 43. Wang CN, Chang SD, Peng HH, Lee YS, Chang YL, et al. (2010) Change in amniotic fluid levels of multiple anti-angiogenic proteins before development of preeclampsia and intrauterine growth restriction. J Clin Endocrinol Metab 95: 1431–1441.CN WangSD ChangHH PengYS LeeYL Chang2010Change in amniotic fluid levels of multiple anti-angiogenic proteins before development of preeclampsia and intrauterine growth restriction.J Clin Endocrinol Metab9514311441
  44. 44. Chang SD, Chao AS, Peng HH, Chang YL, Wang CN, et al. (2011) Analyses of placental gene expression in pregnancy-related hypertensive disorders. Taiwan J Obstet Gynecol 50: 283–291.SD ChangAS ChaoHH PengYL ChangCN Wang2011Analyses of placental gene expression in pregnancy-related hypertensive disorders.Taiwan J Obstet Gynecol50283291
  45. 45. Wang TH, Lee YS, Chen ES, Kong WH, Chen LK, et al. (2004) Establishment of cDNA microarray analysis at the Genomic Medicine Research Core Laboratory (GMRCL) of Chang Gung Memorial Hospital. Chang Gung Med J 27: 243–260.TH WangYS LeeES ChenWH KongLK Chen2004Establishment of cDNA microarray analysis at the Genomic Medicine Research Core Laboratory (GMRCL) of Chang Gung Memorial Hospital.Chang Gung Med J27243260
  46. 46. Lashkari DA, DeRisi JL, McCusker JH, Namath AF, Gentile C, et al. (1997) Yeast microarrays for genome wide parallel genetic and gene expression analysis. Proc Natl Acad Sci U S A 94: 13057–13062.DA LashkariJL DeRisiJH McCuskerAF NamathC. Gentile1997Yeast microarrays for genome wide parallel genetic and gene expression analysis.Proc Natl Acad Sci U S A941305713062
  47. 47. Tsai ML, Chang KY, Chiang CS, Shu WY, Weng TC, et al. (2009) UVB radiation induces persistent activation of ribosome and oxidative phosphorylation pathways. Radiat Res 171: 716–724.ML TsaiKY ChangCS ChiangWY ShuTC Weng2009UVB radiation induces persistent activation of ribosome and oxidative phosphorylation pathways.Radiat Res171716724
  48. 48. Huang CL, Shu WY, Tsai ML, Chiang CS, Chang CW, et al. (2011) Repeated small perturbation approach reveals transcriptomic steady States. PLoS One 6: e29241.CL HuangWY ShuML TsaiCS ChiangCW Chang2011Repeated small perturbation approach reveals transcriptomic steady States.PLoS One6e29241
  49. 49. Chen CR, Shu WY, Tsai ML, Cheng WC, Hsu IC (2012) THEME: A web tool for loop-design microarray data analysis. Comput Biol Med 42: 228–234.CR ChenWY ShuML TsaiWC ChengIC Hsu2012THEME: A web tool for loop-design microarray data analysis.Comput Biol Med42228234
  50. 50. Dennis G Jr, Sherman BT, Hosack DA, Yang J, Gao W, et al. (2003) DAVID: Database for Annotation, Visualization, and Integrated Discovery. Genome Biol 4: P3.G. Dennis JrBT ShermanDA HosackJ. YangW. Gao2003DAVID: Database for Annotation, Visualization, and Integrated Discovery.Genome Biol4P3
  51. 51. Tsai MS, Hwang SM, Chen KD, Lee YS, Hsu LW, et al. (2007) Functional network analysis of the transcriptomes of mesenchymal stem cells derived from amniotic fluid, amniotic membrane, cord blood, and bone marrow. Stem Cells 25: 2511–2523.MS TsaiSM HwangKD ChenYS LeeLW Hsu2007Functional network analysis of the transcriptomes of mesenchymal stem cells derived from amniotic fluid, amniotic membrane, cord blood, and bone marrow.Stem Cells2525112523
  52. 52. Wang TH, Chao A (2007) Microarray analysis of gene expression of cancer to guide the use of chemotherapeutics. Taiwan J Obstet Gynecol 46: 222–229.TH WangA. Chao2007Microarray analysis of gene expression of cancer to guide the use of chemotherapeutics.Taiwan J Obstet Gynecol46222229
  53. 53. Richani K, Romero R, Soto E, Nien JK, Cushenberry E, et al. (2007) Genetic origin and proportion of basal plate surface-lining cells in normal and abnormal pregnancies. Hum Pathol 38: 269–275.K. RichaniR. RomeroE. SotoJK NienE. Cushenberry2007Genetic origin and proportion of basal plate surface-lining cells in normal and abnormal pregnancies.Hum Pathol38269275
  54. 54. Cui X, Hwang JT, Qiu J, Blades NJ, Churchill GA (2005) Improved statistical tests for differential gene expression by shrinking variance components estimates. Biostatistics 6: 59–75.X. CuiJT HwangJ. QiuNJ BladesGA Churchill2005Improved statistical tests for differential gene expression by shrinking variance components estimates.Biostatistics65975
  55. 55. Hong F, Breitling R, McEntee CW, Wittner BS, Nemhauser JL, et al. (2006) RankProd: a bioconductor package for detecting differentially expressed genes in meta-analysis. Bioinformatics 22: 2825–2827.F. HongR. BreitlingCW McEnteeBS WittnerJL Nemhauser2006RankProd: a bioconductor package for detecting differentially expressed genes in meta-analysis.Bioinformatics2228252827
  56. 56. Boulesteix AL, Hothorn T (2010) Testing the additional predictive value of high-dimensional molecular data. BMC Bioinformatics 11: 78.AL BoulesteixT. Hothorn2010Testing the additional predictive value of high-dimensional molecular data.BMC Bioinformatics1178
  57. 57. Korkola JE, DeVries S, Fridlyand J, Hwang ES, Estep AL, et al. (2003) Differentiation of lobular versus ductal breast carcinomas by expression microarray analysis. Cancer Res 63: 7167–7175.JE KorkolaS. DeVriesJ. FridlyandES HwangAL Estep2003Differentiation of lobular versus ductal breast carcinomas by expression microarray analysis.Cancer Res6371677175
  58. 58. Lage-Castellanos A, Martinez-Montes E, Hernandez-Cabrera JA, Galan L (2010) False discovery rate and permutation test: an evaluation in ERP data analysis. Stat Med 29: 63–74.A. Lage-CastellanosE. Martinez-MontesJA Hernandez-CabreraL. Galan2010False discovery rate and permutation test: an evaluation in ERP data analysis.Stat Med296374
  59. 59. Sohn I, Owzar K, George SL, Kim S, Jung SH (2009) A permutation-based multiple testing method for time-course microarray experiments. BMC Bioinformatics 10: 336.I. SohnK. OwzarSL GeorgeS. KimSH Jung2009A permutation-based multiple testing method for time-course microarray experiments.BMC Bioinformatics10336
  60. 60. Guo L, Lobenhofer EK, Wang C, Shippy R, Harris SC, et al. (2006) Rat toxicogenomic study reveals analytical consistency across microarray platforms. Nat Biotechnol 24: 1162–1169.L. GuoEK LobenhoferC. WangR. ShippySC Harris2006Rat toxicogenomic study reveals analytical consistency across microarray platforms.Nat Biotechnol2411621169
  61. 61. Pan KH, Lih CJ, Cohen SN (2005) Effects of threshold choice on biological conclusions reached during analysis of gene expression by DNA microarrays. Proc Natl Acad Sci U S A 102: 8961–8965.KH PanCJ LihSN Cohen2005Effects of threshold choice on biological conclusions reached during analysis of gene expression by DNA microarrays.Proc Natl Acad Sci U S A10289618965
  62. 62. Chuchana P, Holzmuller P, Vezilier F, Berthier D, Chantal I, et al. (2010) Intertwining threshold settings, biological data and database knowledge to optimize the selection of differentially expressed genes from microarray. PLoS ONE 5: e13518.P. ChuchanaP. HolzmullerF. VezilierD. BerthierI. Chantal2010Intertwining threshold settings, biological data and database knowledge to optimize the selection of differentially expressed genes from microarray.PLoS ONE5e13518
  63. 63. Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, et al. (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci U S A 96: 6745–6750.U. AlonN. BarkaiDA NottermanK. GishS. Ybarra1999Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays.Proc Natl Acad Sci U S A9667456750