Parkinson disease (PD) progresses relentlessly and affects approximately 4% of the population aged over 80 years old. It is difficult to diagnose in its early stages. The purpose of our study is to identify molecular biomarkers for PD initiation using a computational bioinformatics analysis of gene expression. We downloaded the gene expression profile of PD from Gene Expression Omnibus and identified differentially coexpressed genes (DCGs) and dysfunctional pathways in PD patients compared to controls. Besides, we built a regulatory network by mapping the DCGs to known regulatory data between transcription factors (TFs) and target genes and calculated the regulatory impact factor of each transcription factor. As the results, a total of 1004 genes associated with PD initiation were identified. Pathway enrichment of these genes suggests that biological processes of protein turnover were impaired in PD. In the regulatory network, HLF, E2F1 and STAT4 were found have altered expression levels in PD patients. The expression levels of other transcription factors, NKX3-1, TAL1, RFX1 and EGR3, were not found altered. However, they regulated differentially expressed genes. In conclusion, we suggest that HLF, E2F1 and STAT4 may be used as molecular biomarkers for PD; however, more work is needed to validate our result.
Citation: Diao H, Li X, Hu S, Liu Y (2012) Gene Expression Profiling Combined with Bioinformatics Analysis Identify Biomarkers for Parkinson Disease. PLoS ONE 7(12): e52319. https://doi.org/10.1371/journal.pone.0052319
Editor: Malú G. Tansey, Emory University, United States of America
Received: August 9, 2012; Accepted: November 16, 2012; Published: December 28, 2012
Copyright: © 2012 Diao et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: This research is supported by Liaoning Science and Technology Plan Projects (No. 2008820). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Parkinson disease (PD) is a common chronic neurodegenerative disorder characterized by selective loss of dopaminergic neurons from the substantia nigra and presence of Lewy bodies . The obvious symptoms are tremor at rest, muscle rigidity, bradykinesia and other movement-related symptoms . PD is difficult to diagnose in its early stages, and when it was diagnosed, the only treatment involved boosting inadequate levels of dopamine in the brain, which did not eliminate all symptoms. Therefore, it is of significantly importance to find molecular biomarkers of PD to improve diagnosis accuracy, monitor disease progression and develop therapeutic interventions.
The etiology of PD remains a puzzling mix of environmental factors, genes and the aged brain , . Epidemiological research indicates that exposure to pesticides elevates the risk of PD. By contrast, caffeine and tobacco are associated with reduced risk . In recent years, several causative genes of PD have been identified, including α-synuclein (SNCA), parkin (PARK2), UCHL-1 (PARK5), PINK1 (PARK6), DJ-1 (PARK7), LRRK2 (PARK8) and ATP13A2 (PARK9) , . These PD-linked molecules are candidate biomarkers for PD . Among them, the levels of DJ-1 and α-synuclein in human cerebrospinal fluid and blood between PD patients and non-PD controls are the most frequently tested biomarkers in previous studies; however, the results are conflicting , , , , . At this stage, neither DJ-1 nor α-synuclein alone appears to be satisfactory as the biological biomarker for PD. Besides, changed levels of Urinary 8-hydroxydeoxyguanosin (Urinary 8-OHdGe) and proinflammatory cytokines such as tumor necrosis factor α (TNF-α), interleukin 6 (IL-6) and interleukin 1β (IL-1β) are also been studied as biomarkers for PD , . Godau et al. recently showed that the levels of serum insulin-like growth factor (IGF-2) were significantly higher in PD patients than that in controls .
The purpose of this study is to identify molecular biomarkers for PD initiation using a computational bioinformatics analysis of gene expression. The availability and integration of high-throughput gene expression data and the computational bioinformatics analysis may shed new lights on molecular biomarker identification of PD.
Materials and Methods
Affymetrix Microarray Data
The transcription profile of GSE 20333 was downloaded from a public functional genomics data repository GEO (Gene Expression Omnibus) (http://www.ncbi.nlm.nih.gov/geo/). Affymetrix HG-Focus array was used to determine a global gene expression profile of clinically and neuropathologically confirmed cases of sporadic Parkinson disease (n = 6) compared to controls (n = 6). Postmortem human brains were obtained from moderately to severe Parkinsonism individuals based on the Hoehn & Yahr criteria. The average age for PD and control is 76.6 and 77.8 years, respectively. The average postmortem delay for PD and control is 26.2 and 19.8 hours, respectively.
KEGG (Kyoto Encyclopedia of Genes and Genomes) is one of the most popular pathway databases; it groups genes into pathways of interacting genes and substrates, and contains specific links between genes and substrates that interact directly , . The PATHWAY database records networks of molecular interactions in the cells, and variants of them specific to particular organisms (http://www.genome.jp/kegg/). We collected pathway information from KEGG on June 30, 2011.
UCSC (http://genome.ucsc.edu) is an interactive website offering access to genome sequence data from a variety of vertebrate and invertebrate species and major model organisms, integrated with a large collection of aligned annotations. We downloaded the human transcription factors (TFs) and their target chromosome region from UCSC. Then, we downloaded the chromosome annotation information from NCBI and analyzed the relationships between TFs and their target genes.
Differentially Coexpression Analysis
From the perspective of systems biology, functionally related genes are frequently coexpressed across a set of samples , , . Differentially Coexpressed Genes and Links (DCGL) is designed for identifying differentially coexpressed genes and links from gene expression microarray data .
For GSE20333, we used the DCGL package ,  in R  to identify differentially coexpressed genes (DCGs) and links in PD patients compared to non-PD controls. We calculated the p-values and adjusted the raw p-values into false discovery rate (FDR) using the Benjamini-Hochberg method  to circumvent the multi-test problem which might induce too much false positive results. The genes only with FDR <0.25 were selected as differentially coexpressed genes.
Pathway Enrichment Analysis
In order to facilitate the functional annotation and analysis of large lists of genes in our result, we inputted all the DCGs into DAVID (The Database for Annotation, Visualization and Integrated Discovery) for KEGG (Kyoto Encyclopedia of Genes and Genomes) term enrichment analysis. The DAVID identifies canonical pathways associated with a given list of genes by calculating the hypergeometric test p-value for probability that association between this set of genes and a canonical pathway . We chose p-value <0.05 as the cut-off criterion.
Measures of RIF
Regulatory impact factor (RIF) appears to be a robust and valuable methodology to identify the regulators with the highest evidence of contributing to differential expression in two biological conditions. It is a metric given to each TF that combines the change in coexpression between the TF and the DEGs (i.e. the potential targets). The measures of RIF are computed as follows :(1)where nde is the number of DEGs; e1j and e2j represent the expression value of the jth DEG in conditions 1 and 2, respectively; r1ij and r2ij represent the coexpression correlation between the ith TF and the jth DEG in conditions 1 and 2, respectively.
Identification of Differentially Coexpressed Genes in PD
We downloaded publicly available microarray dataset GSE20333 from GEO database and applied DCGL package in R to identify DCGs in 6 PD patients and 6 non-PD controls. Among all genes tested, we found a total of 1004 DCGs with FDR <0.25. Besides, a total of 459683 links were predicted among these DCGs.
Enrichment of PD Associated Pathways
In order to functional annotation of the large lists of genes in our result, we used the online biological classification tool DAVID and observed significant enrichment of these genes in multiple KEGG categories (Table 1). Pathway analysis revealed that the DCGs were strongly associated with Ribosome (p = 2.21E-06), and Neurotrophin signaling pathway (p = 1.45E-04). In addition, Steroid biosynthesis, Spliceosome, and NOD-like receptor signaling pathway showed evidence of association with the differentially co-expressed genes (p<0.01).
Regulatory Network Construction
We matched the 1004 DCGs and the 459683 links to the known regulatory data between transcription factors (TFs) and target genes, and obtained a total of 745 pairs of relationships between 82 TFs and 601 target genes. By integrating the regulatory relationships above, we built a regulatory network using Cytoscape  (Figure 1).
Impact Analysis of Transcription Factor
The above network generates vast amounts of data. In order to focus on the most meaningful information, we calculated the RIF of each TF. The top 5 ranked TFs are HLF (hepatic leukemia factor), NKX3-1 (NK3 homeobox 1), TAL1 (T cell acute lymphocytic leukemia 1), RFX1 (regulatory factor X, 1) and EGR3 (early growth response 3) (Table 2). The relationships between these top 5 TFs and their target genes were shown in Figure 2 and Table 3. From Table 3, we could find that HLF, E2F1 (E2F transcription factor 1) and STAT4 (signal transducer and activator of transcription 4) are both TFs and DCGs. Other TFs, such as NKX3-1, TAL1, RFX1 and EGR3, are not DCGs, but their target genes are.
The red nodes represent transcription factors and the green nodes represent their target genes.
Molecular biomarkers are useful to improve diagnosis, to predict clinical behavior and to demonstrate new therapeutic efficacy. Since microarray can interrogate expression levels of thousands of genes in human genome simultaneously, it has been widely used in discovery of disease biomarkers , , . In this work, we have analysed gene expression data with computational methods with the aim of uncovering genes that potentially dysregulate in PD. We identified a total of 1004 DCGs in PD patients compared to non-PD controls. After regulatory network construction and regulatory impact factor analysis, we found that the transcription factors HLF, E2F1, STAT4, NKX3-1, TAL1, RFX1 and EGR3 may play important roles in PD initiation. Of these, HLF, STAT4 and E2F1 were found have altered expression levels in PD patients. The expression levels of other transcription factors, NKX3-1, TAL1, RFX1 and EGR3, were not found altered. However, they regulated differentially expressed genes.
HLF encodes a member of the proline and acidic-rihc protein family, a subset of the bZIP transcription factors. Chromosomal translocations fusing portions of this gene with the E2A gene cause a subset of childhood B-lineage acute lymphoid leukemias . While HLF has been linked to malignancies of the lymphoid system, it is detected in the liver, kidney, and adult nervous system by northern blotting . Hitzler et al. found that HLF expression increased markedly with synaptogenesis and was coincident with barrel formation and suggested that HLF plays a role in the function of differentiated neurons in the adult nervous system . HLF appears as the most significant transcription factors related to the differential expression of genes in PD patients.
E2F1 is a member of the E2F family of transcription factors. The E2F family plays a crucial role in the control of cell cycle and action of tumor suppressor proteins and is also a target of the transforming proteins of small DNA tumor viruses. Several studies have demonstrated that E2F1 contributes to neuronal damage and death using in vitro models of neurodegeneration , , . E2F1 immunoreactivity and/or protein levels were reported to increase in neurons of patients with PD . They showed that pRb/E2F pathway is activated in dopaminergic neurons in PD, but also demonstrated that activation of this pathway is instrumental in the degeneration of these neurons in the MPTP/MPP+ model of the disease . In a recent study, Lu and his colleagues showed that mutations in LRRK2 cause PD through inhibiting the translational repression of the transcription factors E2F1 and DP .
STAT4 is a transcription factor belonging to the signal transducer and activator of transcription protein family . STAT4 is involved in the signaling of interleukin-12 and interferon -γ, as well as interleukin-23 . Though we found STAT4 was differentially expressed in PD patients compared to non-PD controls, the gene has no known role in PD pathogenesis to data.
From Table 1, we could find that the most significant enriched pathway is ribosome which is responsible for catalyzing the formation of proteins from individual amino acids. Besides, some pathways associated with protein synthesis were also enriched in the result, such as ribosome, steroid biosynthesis, and spliceosome. This result suggests that biological processes of protein turnover were impaired in PD. Our result is in line with previous study , .
In conclusion, we have identified molecular biomarkers for PD initiation using a computational bioinformatics analysis of gene expression. A total of 1004 differentially coexpressed genes were identified between PD patients and non-PD controls. Pathway enrichment of these genes suggests that biological processes of protein turnover were impaired in PD. After regulatory network construction and regulatory impact factor analysis, we found that the transcription factors HLF, E2F1, STAT4, NKX3-1, TAL1, RFX1 and EGR3 may play important roles in PD initiation. Of these, HLF, STAT4 and E2F1 were found have altered expression levels in PD patients. Therefore, we suggested that HLF, E2F1 and STAT4 may be used as biomarkers for PD; however, more work is needed to validate our result.
Conceived and designed the experiments: HD. Performed the experiments: XL. Analyzed the data: HD XL. Contributed reagents/materials/analysis tools: SH. Wrote the paper: YL.
- 1. Foulds P, Mann DM, Mitchell JD, Allsop D (2010) Parkinson disease: Progress towards a molecular biomarker for Parkinson disease. Nat Rev Neurol 6: 359–361.
- 2. Jankovic J (2008) Parkinson’s disease: clinical features and diagnosis. J Neurol Neurosurg Psychiatry 79: 368–376.
- 3. Elbaz A, Dufouil C, Alperovitch A (2007) Interaction between genes and environment in neurodegenerative diseases. C R Biol 330: 318–328.
- 4. Douglas PM, Dillin A (2010) Protein homeostasis and aging in neurodegeneration. J Cell Biol 190: 719–729.
- 5. Chade AR, Kasten M, Tanner CM (2006) Nongenetic causes of Parkinson’s disease. J Neural Transm Suppl 70: 147–151.
- 6. Hashimoto M, Masliah E (1999) Alpha-synuclein in Lewy body disease and Alzheimer’s disease. Brain Pathol 9: 707–720.
- 7. Hatano T, Kubo S, Sato S, Hattori N (2009) Pathogenesis of familial Parkinson’s disease: new insights based on monogenic forms of Parkinson’s disease. J Neurochem 111: 1075–1093.
- 8. Devic I, Hwang H, Edgar JS, Izutsu K, Presland R, et al. (2011) Salivary alpha-synuclein and DJ-1: potential biomarkers for Parkinson’s disease. Brain 134: e178.
- 9. Waragai M, Wei J, Fujita M, Nakai M, Ho GJ, et al. (2006) Increased level of DJ-1 in the cerebrospinal fluids of sporadic Parkinson’s disease. Biochem Biophys Res Commun 345: 967–972.
- 10. Maita C, Tsuji S, Yabe I, Hamada S, Ogata A, et al. (2008) Secretion of DJ-1 into the serum of patients with Parkinson’s disease. Neurosci Lett 431: 86–89.
- 11. Tokuda T, Salem SA, Allsop D, Mizuno T, Nakagawa M, et al. (2006) Decreased alpha-synuclein in cerebrospinal fluid of aged individuals and subjects with Parkinson’s disease. Biochem Biophys Res Commun 349: 162–166.
- 12. Ohrfelt A, Grognet P, Andreasen N, Wallin A, Vanmechelen E, et al. (2009) Cerebrospinal fluid alpha-synuclein in neurodegenerative disorders-a marker of synapse loss? Neurosci Lett 450: 332–335.
- 13. Hong Z, Shi M, Chung KA, Quinn JF, Peskind ER, et al. (2010) DJ-1 and alpha-synuclein in human cerebrospinal fluid as biomarkers of Parkinson’s disease. Brain 133: 713–726.
- 14. Nagatsu T, Sawada M (2005) Inflammatory process in Parkinson’s disease: role for cytokines. Curr Pharm Des 11: 999–1016.
- 15. Sato S, Mizuno Y, Hattori N (2005) Urinary 8-hydroxydeoxyguanosine levels as a biomarker for progression of Parkinson disease. Neurology 64: 1081–1083.
- 16. Godau J, Herfurth M, Kattner B, Gasser T, Berg D (2010) Increased serum insulin-like growth factor 1 in early idiopathic Parkinson’s disease. J Neurol Neurosurg Psychiatry 81: 536–538.
- 17. Kanehisa M (2002) The KEGG database. Novartis Foundation symposium 247: 91–101; discussion 101–103, 119–128, 244–152.
- 18. Kanehisa M, Goto S (2000) KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28: 27–30.
- 19. Stuart JM, Segal E, Koller D, Kim SK (2003) A gene-coexpression network for global discovery of conserved genetic modules. Science 302: 249–255.
- 20. Lee HK, Hsu AK, Sajdak J, Qin J, Pavlidis P (2004) Coexpression analysis of human genes across many microarray data sets. Genome Res 14: 1085–1094.
- 21. Bergmann S, Ihmels J, Barkai N (2004) Similarities and differences in genome-wide expression data of six organisms. PLoS Biol 2: E9.
- 22. Liu BH, Yu H, Tu K, Li C, Li YX, et al. (2010) DCGL: an R package for identifying differentially coexpressed genes and links from gene expression microarray data. Bioinformatics 26: 2637–2638.
- 23. Yu H, Liu BH, Ye ZQ, Li C, Li YX, et al. (2011) Link-based quantitative methods to identify differentially coexpressed genes and gene pairs. BMC Bioinformatics 12: 315.
- 24. Team RDC (2011) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing.
- 25. Benjamini Y, Hochberg Y (1995) Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the royal statistical society Series B (Methodological) 57: 289–300.
- 26. Huang da W, Sherman BT, Lempicki RA (2009) Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc 4: 44–57.
- 27. Reverter A, Hudson NJ, Nagaraj SH, Perez-Enciso M, Dalrymple BP (2010) Regulatory impact factors: unraveling the transcriptional regulation of complex traits from expression data. Bioinformatics 26: 896–904.
- 28. Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, et al. (2003) Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 13: 2498–2504.
- 29. Cooper CS, Campbell C, Jhavar S (2007) Mechanisms of Disease: biomarkers and molecular targets from microarray gene expression studies in prostate cancer. Nat Clin Pract Urol 4: 677–687.
- 30. Scherzer CR, Eklund AC, Morse LJ, Liao Z, Locascio JJ, et al. (2007) Molecular markers of early Parkinson’s disease based on gene expression in blood. Proc Natl Acad Sci U S A 104: 955–960.
- 31. Guttula SV, Allam A, Gumpeny RS (2012) Analyzing microarray data of Alzheimer’s using cluster analysis to identify the biomarker genes. Int J Alzheimers Dis 2012: 649456.
- 32. Honda H, Inaba T, Suzuki T, Oda H, Ebihara Y, et al. (1999) Expression of E2A-HLF chimeric protein induced T-cell apoptosis, B-cell maturation arrest, and development of acute lymphoblastic leukemia. Blood 93: 2780–2790.
- 33. Inaba T, Roberts WM, Shapiro LH, Jolly KW, Raimondi SC, et al. (1992) Fusion of the leucine zipper gene HLF to the E2A gene in human acute B-lineage leukemia. Science 257: 531–534.
- 34. Hitzler JK, Soares HD, Drolet DW, Inaba T, O’Connel S, et al. (1999) Expression patterns of the hepatic leukemia factor gene in the nervous system of developing and adult mice. Brain Res 820: 1–11.
- 35. Hou ST, Xie X, Baggley A, Park DS, Chen G, et al. (2002) Activation of the Rb/E2F1 pathway by the nonproliferative p38 MAPK during Fas (APO1/CD95)-mediated neuronal apoptosis. J Biol Chem 277: 48764–48770.
- 36. Jiang SX, Sheldrick M, Desbois A, Slinn J, Hou ST (2007) Neuropilin-1 is a direct target of the transcription factor E2F1 during cerebral ischemia-induced neuronal death in vivo. Mol Cell Biol 27: 1696–1705.
- 37. Smith RA, Walker T, Xie X, Hou ST (2003) Involvement of the transcription factor E2F1/Rb in kainic acid-induced death of murine cerebellar granule cells. Brain Res Mol Brain Res 116: 70–79.
- 38. Hoglinger GU, Breunig JJ, Depboylu C, Rouaux C, Michel PP, et al. (2007) The pRb/E2F cell-cycle pathway mediates cell death in Parkinson’s disease. Proc Natl Acad Sci U S A 104: 3585–3590.
- 39. Gehrke S, Imai Y, Sokol N, Lu B (2010) Pathogenic LRRK2 negatively regulates microRNA-mediated translational repression. Nature 466: 637–641.
- 40. Yamamoto K, Quelle FW, Thierfelder WE, Kreider BL, Gilbert DJ, et al. (1994) Stat4, a novel gamma interferon activation site-binding protein expressed in early myeloid differentiation. Mol Cell Biol 14: 4342–4349.
- 41. Bacon CM, Petricoin EF, 3rd, Ortaldo JR, Rees RC, Larner AC, et al (1995) Interleukin 12 induces tyrosine phosphorylation and activation of STAT4 in human lymphocytes. Proc Natl Acad Sci U S A 92: 7307–7311.
- 42. Rubinsztein DC (2006) The roles of intracellular protein-degradation pathways in neurodegeneration. Nature 443: 780–786.
- 43. Davie CA (2008) A review of Parkinson’s disease. Br Med Bull 86: 109–127.