Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Gene Expression Profiling Combined with Bioinformatics Analysis Identify Biomarkers for Parkinson Disease

  • Hongyu Diao,

    Affiliation Department of Neurosurgery, Shengjing Hospital of China Medical University, Shenyang, China

  • Xinxing Li,

    Affiliation Department of Neurosurgery, Shengjing Hospital of China Medical University, Shenyang, China

  • Sheng Hu,

    Affiliation Department of Neurosurgery, The Second People’s Hospital of Chaoyang City, Chaoyang, China

  • Yunhui Liu

    liuyh@sj-hospital.org

    Affiliation Department of Neurosurgery, Shengjing Hospital of China Medical University, Shenyang, China

Gene Expression Profiling Combined with Bioinformatics Analysis Identify Biomarkers for Parkinson Disease

  • Hongyu Diao, 
  • Xinxing Li, 
  • Sheng Hu, 
  • Yunhui Liu
PLOS
x

Abstract

Parkinson disease (PD) progresses relentlessly and affects approximately 4% of the population aged over 80 years old. It is difficult to diagnose in its early stages. The purpose of our study is to identify molecular biomarkers for PD initiation using a computational bioinformatics analysis of gene expression. We downloaded the gene expression profile of PD from Gene Expression Omnibus and identified differentially coexpressed genes (DCGs) and dysfunctional pathways in PD patients compared to controls. Besides, we built a regulatory network by mapping the DCGs to known regulatory data between transcription factors (TFs) and target genes and calculated the regulatory impact factor of each transcription factor. As the results, a total of 1004 genes associated with PD initiation were identified. Pathway enrichment of these genes suggests that biological processes of protein turnover were impaired in PD. In the regulatory network, HLF, E2F1 and STAT4 were found have altered expression levels in PD patients. The expression levels of other transcription factors, NKX3-1, TAL1, RFX1 and EGR3, were not found altered. However, they regulated differentially expressed genes. In conclusion, we suggest that HLF, E2F1 and STAT4 may be used as molecular biomarkers for PD; however, more work is needed to validate our result.

Introduction

Parkinson disease (PD) is a common chronic neurodegenerative disorder characterized by selective loss of dopaminergic neurons from the substantia nigra and presence of Lewy bodies [1]. The obvious symptoms are tremor at rest, muscle rigidity, bradykinesia and other movement-related symptoms [2]. PD is difficult to diagnose in its early stages, and when it was diagnosed, the only treatment involved boosting inadequate levels of dopamine in the brain, which did not eliminate all symptoms. Therefore, it is of significantly importance to find molecular biomarkers of PD to improve diagnosis accuracy, monitor disease progression and develop therapeutic interventions.

The etiology of PD remains a puzzling mix of environmental factors, genes and the aged brain [3], [4]. Epidemiological research indicates that exposure to pesticides elevates the risk of PD. By contrast, caffeine and tobacco are associated with reduced risk [5]. In recent years, several causative genes of PD have been identified, including α-synuclein (SNCA), parkin (PARK2), UCHL-1 (PARK5), PINK1 (PARK6), DJ-1 (PARK7), LRRK2 (PARK8) and ATP13A2 (PARK9) [6], [7]. These PD-linked molecules are candidate biomarkers for PD [8]. Among them, the levels of DJ-1 and α-synuclein in human cerebrospinal fluid and blood between PD patients and non-PD controls are the most frequently tested biomarkers in previous studies; however, the results are conflicting [9], [10], [11], [12], [13]. At this stage, neither DJ-1 nor α-synuclein alone appears to be satisfactory as the biological biomarker for PD. Besides, changed levels of Urinary 8-hydroxydeoxyguanosin (Urinary 8-OHdGe) and proinflammatory cytokines such as tumor necrosis factor α (TNF-α), interleukin 6 (IL-6) and interleukin 1β (IL-1β) are also been studied as biomarkers for PD [14], [15]. Godau et al. recently showed that the levels of serum insulin-like growth factor (IGF-2) were significantly higher in PD patients than that in controls [16].

The purpose of this study is to identify molecular biomarkers for PD initiation using a computational bioinformatics analysis of gene expression. The availability and integration of high-throughput gene expression data and the computational bioinformatics analysis may shed new lights on molecular biomarker identification of PD.

Materials and Methods

Affymetrix Microarray Data

The transcription profile of GSE 20333 was downloaded from a public functional genomics data repository GEO (Gene Expression Omnibus) (http://www.ncbi.nlm.nih.gov/geo/). Affymetrix HG-Focus array was used to determine a global gene expression profile of clinically and neuropathologically confirmed cases of sporadic Parkinson disease (n = 6) compared to controls (n = 6). Postmortem human brains were obtained from moderately to severe Parkinsonism individuals based on the Hoehn & Yahr criteria. The average age for PD and control is 76.6 and 77.8 years, respectively. The average postmortem delay for PD and control is 26.2 and 19.8 hours, respectively.

Pathway Data

KEGG (Kyoto Encyclopedia of Genes and Genomes) is one of the most popular pathway databases; it groups genes into pathways of interacting genes and substrates, and contains specific links between genes and substrates that interact directly [17], [18]. The PATHWAY database records networks of molecular interactions in the cells, and variants of them specific to particular organisms (http://www.genome.jp/kegg/). We collected pathway information from KEGG on June 30, 2011.

Regulatory Data

UCSC (http://genome.ucsc.edu) is an interactive website offering access to genome sequence data from a variety of vertebrate and invertebrate species and major model organisms, integrated with a large collection of aligned annotations. We downloaded the human transcription factors (TFs) and their target chromosome region from UCSC. Then, we downloaded the chromosome annotation information from NCBI and analyzed the relationships between TFs and their target genes.

Differentially Coexpression Analysis

From the perspective of systems biology, functionally related genes are frequently coexpressed across a set of samples [19], [20], [21]. Differentially Coexpressed Genes and Links (DCGL) is designed for identifying differentially coexpressed genes and links from gene expression microarray data [22].

For GSE20333, we used the DCGL package [22], [23] in R [24] to identify differentially coexpressed genes (DCGs) and links in PD patients compared to non-PD controls. We calculated the p-values and adjusted the raw p-values into false discovery rate (FDR) using the Benjamini-Hochberg method [25] to circumvent the multi-test problem which might induce too much false positive results. The genes only with FDR <0.25 were selected as differentially coexpressed genes.

Pathway Enrichment Analysis

In order to facilitate the functional annotation and analysis of large lists of genes in our result, we inputted all the DCGs into DAVID (The Database for Annotation, Visualization and Integrated Discovery) for KEGG (Kyoto Encyclopedia of Genes and Genomes) term enrichment analysis. The DAVID identifies canonical pathways associated with a given list of genes by calculating the hypergeometric test p-value for probability that association between this set of genes and a canonical pathway [26]. We chose p-value <0.05 as the cut-off criterion.

Measures of RIF

Regulatory impact factor (RIF) appears to be a robust and valuable methodology to identify the regulators with the highest evidence of contributing to differential expression in two biological conditions. It is a metric given to each TF that combines the change in coexpression between the TF and the DEGs (i.e. the potential targets). The measures of RIF are computed as follows [27]:(1)where nde is the number of DEGs; e1j and e2j represent the expression value of the jth DEG in conditions 1 and 2, respectively; r1ij and r2ij represent the coexpression correlation between the ith TF and the jth DEG in conditions 1 and 2, respectively.

Results

Identification of Differentially Coexpressed Genes in PD

We downloaded publicly available microarray dataset GSE20333 from GEO database and applied DCGL package in R to identify DCGs in 6 PD patients and 6 non-PD controls. Among all genes tested, we found a total of 1004 DCGs with FDR <0.25. Besides, a total of 459683 links were predicted among these DCGs.

Enrichment of PD Associated Pathways

In order to functional annotation of the large lists of genes in our result, we used the online biological classification tool DAVID and observed significant enrichment of these genes in multiple KEGG categories (Table 1). Pathway analysis revealed that the DCGs were strongly associated with Ribosome (p = 2.21E-06), and Neurotrophin signaling pathway (p = 1.45E-04). In addition, Steroid biosynthesis, Spliceosome, and NOD-like receptor signaling pathway showed evidence of association with the differentially co-expressed genes (p<0.01).

Regulatory Network Construction

We matched the 1004 DCGs and the 459683 links to the known regulatory data between transcription factors (TFs) and target genes, and obtained a total of 745 pairs of relationships between 82 TFs and 601 target genes. By integrating the regulatory relationships above, we built a regulatory network using Cytoscape [28] (Figure 1).

thumbnail
Figure 1. Regulatory network construction among TFs and their target genes.

The red nodes represent TFs and the green nodes represent their target genes. Large nodes are differentially co-expressed genes and small nodes are non-DCGs.

http://dx.doi.org/10.1371/journal.pone.0052319.g001

Impact Analysis of Transcription Factor

The above network generates vast amounts of data. In order to focus on the most meaningful information, we calculated the RIF of each TF. The top 5 ranked TFs are HLF (hepatic leukemia factor), NKX3-1 (NK3 homeobox 1), TAL1 (T cell acute lymphocytic leukemia 1), RFX1 (regulatory factor X, 1) and EGR3 (early growth response 3) (Table 2). The relationships between these top 5 TFs and their target genes were shown in Figure 2 and Table 3. From Table 3, we could find that HLF, E2F1 (E2F transcription factor 1) and STAT4 (signal transducer and activator of transcription 4) are both TFs and DCGs. Other TFs, such as NKX3-1, TAL1, RFX1 and EGR3, are not DCGs, but their target genes are.

thumbnail
Figure 2. The regulatory relationships between the top 5 TFs and their target genes.

The red nodes represent transcription factors and the green nodes represent their target genes.

http://dx.doi.org/10.1371/journal.pone.0052319.g002

thumbnail
Table 3. The regulatory relationships between the top 5 TFs and their target genes.

http://dx.doi.org/10.1371/journal.pone.0052319.t003

Discussion

Molecular biomarkers are useful to improve diagnosis, to predict clinical behavior and to demonstrate new therapeutic efficacy. Since microarray can interrogate expression levels of thousands of genes in human genome simultaneously, it has been widely used in discovery of disease biomarkers [29], [30], [31]. In this work, we have analysed gene expression data with computational methods with the aim of uncovering genes that potentially dysregulate in PD. We identified a total of 1004 DCGs in PD patients compared to non-PD controls. After regulatory network construction and regulatory impact factor analysis, we found that the transcription factors HLF, E2F1, STAT4, NKX3-1, TAL1, RFX1 and EGR3 may play important roles in PD initiation. Of these, HLF, STAT4 and E2F1 were found have altered expression levels in PD patients. The expression levels of other transcription factors, NKX3-1, TAL1, RFX1 and EGR3, were not found altered. However, they regulated differentially expressed genes.

HLF encodes a member of the proline and acidic-rihc protein family, a subset of the bZIP transcription factors. Chromosomal translocations fusing portions of this gene with the E2A gene cause a subset of childhood B-lineage acute lymphoid leukemias [32]. While HLF has been linked to malignancies of the lymphoid system, it is detected in the liver, kidney, and adult nervous system by northern blotting [33]. Hitzler et al. found that HLF expression increased markedly with synaptogenesis and was coincident with barrel formation and suggested that HLF plays a role in the function of differentiated neurons in the adult nervous system [34]. HLF appears as the most significant transcription factors related to the differential expression of genes in PD patients.

E2F1 is a member of the E2F family of transcription factors. The E2F family plays a crucial role in the control of cell cycle and action of tumor suppressor proteins and is also a target of the transforming proteins of small DNA tumor viruses. Several studies have demonstrated that E2F1 contributes to neuronal damage and death using in vitro models of neurodegeneration [35], [36], [37]. E2F1 immunoreactivity and/or protein levels were reported to increase in neurons of patients with PD [38]. They showed that pRb/E2F pathway is activated in dopaminergic neurons in PD, but also demonstrated that activation of this pathway is instrumental in the degeneration of these neurons in the MPTP/MPP+ model of the disease [38]. In a recent study, Lu and his colleagues showed that mutations in LRRK2 cause PD through inhibiting the translational repression of the transcription factors E2F1 and DP [39].

STAT4 is a transcription factor belonging to the signal transducer and activator of transcription protein family [40]. STAT4 is involved in the signaling of interleukin-12 and interferon -γ, as well as interleukin-23 [41]. Though we found STAT4 was differentially expressed in PD patients compared to non-PD controls, the gene has no known role in PD pathogenesis to data.

From Table 1, we could find that the most significant enriched pathway is ribosome which is responsible for catalyzing the formation of proteins from individual amino acids. Besides, some pathways associated with protein synthesis were also enriched in the result, such as ribosome, steroid biosynthesis, and spliceosome. This result suggests that biological processes of protein turnover were impaired in PD. Our result is in line with previous study [42], [43].

In conclusion, we have identified molecular biomarkers for PD initiation using a computational bioinformatics analysis of gene expression. A total of 1004 differentially coexpressed genes were identified between PD patients and non-PD controls. Pathway enrichment of these genes suggests that biological processes of protein turnover were impaired in PD. After regulatory network construction and regulatory impact factor analysis, we found that the transcription factors HLF, E2F1, STAT4, NKX3-1, TAL1, RFX1 and EGR3 may play important roles in PD initiation. Of these, HLF, STAT4 and E2F1 were found have altered expression levels in PD patients. Therefore, we suggested that HLF, E2F1 and STAT4 may be used as biomarkers for PD; however, more work is needed to validate our result.

Author Contributions

Conceived and designed the experiments: HD. Performed the experiments: XL. Analyzed the data: HD XL. Contributed reagents/materials/analysis tools: SH. Wrote the paper: YL.

References

  1. 1. Foulds P, Mann DM, Mitchell JD, Allsop D (2010) Parkinson disease: Progress towards a molecular biomarker for Parkinson disease. Nat Rev Neurol 6: 359–361. doi: 10.1038/nrneurol.2010.78
  2. 2. Jankovic J (2008) Parkinson’s disease: clinical features and diagnosis. J Neurol Neurosurg Psychiatry 79: 368–376. doi: 10.1136/jnnp.2007.131045
  3. 3. Elbaz A, Dufouil C, Alperovitch A (2007) Interaction between genes and environment in neurodegenerative diseases. C R Biol 330: 318–328. doi: 10.1016/j.crvi.2007.02.018
  4. 4. Douglas PM, Dillin A (2010) Protein homeostasis and aging in neurodegeneration. J Cell Biol 190: 719–729. doi: 10.1083/jcb.201005144
  5. 5. Chade AR, Kasten M, Tanner CM (2006) Nongenetic causes of Parkinson’s disease. J Neural Transm Suppl 70: 147–151.
  6. 6. Hashimoto M, Masliah E (1999) Alpha-synuclein in Lewy body disease and Alzheimer’s disease. Brain Pathol 9: 707–720. doi: 10.1111/j.1750-3639.1999.tb00552.x
  7. 7. Hatano T, Kubo S, Sato S, Hattori N (2009) Pathogenesis of familial Parkinson’s disease: new insights based on monogenic forms of Parkinson’s disease. J Neurochem 111: 1075–1093. doi: 10.1111/j.1471-4159.2009.06403.x
  8. 8. Devic I, Hwang H, Edgar JS, Izutsu K, Presland R, et al. (2011) Salivary alpha-synuclein and DJ-1: potential biomarkers for Parkinson’s disease. Brain 134: e178. doi: 10.1093/brain/awr015
  9. 9. Waragai M, Wei J, Fujita M, Nakai M, Ho GJ, et al. (2006) Increased level of DJ-1 in the cerebrospinal fluids of sporadic Parkinson’s disease. Biochem Biophys Res Commun 345: 967–972. doi: 10.1016/j.bbrc.2006.05.011
  10. 10. Maita C, Tsuji S, Yabe I, Hamada S, Ogata A, et al. (2008) Secretion of DJ-1 into the serum of patients with Parkinson’s disease. Neurosci Lett 431: 86–89. doi: 10.1016/j.neulet.2007.11.027
  11. 11. Tokuda T, Salem SA, Allsop D, Mizuno T, Nakagawa M, et al. (2006) Decreased alpha-synuclein in cerebrospinal fluid of aged individuals and subjects with Parkinson’s disease. Biochem Biophys Res Commun 349: 162–166. doi: 10.1016/j.bbrc.2006.08.024
  12. 12. Ohrfelt A, Grognet P, Andreasen N, Wallin A, Vanmechelen E, et al. (2009) Cerebrospinal fluid alpha-synuclein in neurodegenerative disorders-a marker of synapse loss? Neurosci Lett 450: 332–335. doi: 10.1016/j.neulet.2008.11.015
  13. 13. Hong Z, Shi M, Chung KA, Quinn JF, Peskind ER, et al. (2010) DJ-1 and alpha-synuclein in human cerebrospinal fluid as biomarkers of Parkinson’s disease. Brain 133: 713–726. doi: 10.1093/brain/awq008
  14. 14. Nagatsu T, Sawada M (2005) Inflammatory process in Parkinson’s disease: role for cytokines. Curr Pharm Des 11: 999–1016. doi: 10.2174/1381612053381620
  15. 15. Sato S, Mizuno Y, Hattori N (2005) Urinary 8-hydroxydeoxyguanosine levels as a biomarker for progression of Parkinson disease. Neurology 64: 1081–1083. doi: 10.1212/01.wnl.0000154597.24838.6b
  16. 16. Godau J, Herfurth M, Kattner B, Gasser T, Berg D (2010) Increased serum insulin-like growth factor 1 in early idiopathic Parkinson’s disease. J Neurol Neurosurg Psychiatry 81: 536–538. doi: 10.1136/jnnp.2009.175752
  17. 17. Kanehisa M (2002) The KEGG database. Novartis Foundation symposium 247: 91–101; discussion 101–103, 119–128, 244–152.
  18. 18. Kanehisa M, Goto S (2000) KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28: 27–30. doi: 10.1093/nar/28.1.27
  19. 19. Stuart JM, Segal E, Koller D, Kim SK (2003) A gene-coexpression network for global discovery of conserved genetic modules. Science 302: 249–255. doi: 10.1126/science.1087447
  20. 20. Lee HK, Hsu AK, Sajdak J, Qin J, Pavlidis P (2004) Coexpression analysis of human genes across many microarray data sets. Genome Res 14: 1085–1094. doi: 10.1101/gr.1910904
  21. 21. Bergmann S, Ihmels J, Barkai N (2004) Similarities and differences in genome-wide expression data of six organisms. PLoS Biol 2: E9. doi: 10.1371/journal.pbio.0020009
  22. 22. Liu BH, Yu H, Tu K, Li C, Li YX, et al. (2010) DCGL: an R package for identifying differentially coexpressed genes and links from gene expression microarray data. Bioinformatics 26: 2637–2638. doi: 10.1093/bioinformatics/btq471
  23. 23. Yu H, Liu BH, Ye ZQ, Li C, Li YX, et al. (2011) Link-based quantitative methods to identify differentially coexpressed genes and gene pairs. BMC Bioinformatics 12: 315. doi: 10.1186/1471-2105-12-315
  24. 24. Team RDC (2011) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing.
  25. 25. Benjamini Y, Hochberg Y (1995) Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the royal statistical society Series B (Methodological) 57: 289–300.
  26. 26. Huang da W, Sherman BT, Lempicki RA (2009) Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc 4: 44–57. doi: 10.1038/nprot.2008.211
  27. 27. Reverter A, Hudson NJ, Nagaraj SH, Perez-Enciso M, Dalrymple BP (2010) Regulatory impact factors: unraveling the transcriptional regulation of complex traits from expression data. Bioinformatics 26: 896–904. doi: 10.1093/bioinformatics/btq051
  28. 28. Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, et al. (2003) Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 13: 2498–2504. doi: 10.1101/gr.1239303
  29. 29. Cooper CS, Campbell C, Jhavar S (2007) Mechanisms of Disease: biomarkers and molecular targets from microarray gene expression studies in prostate cancer. Nat Clin Pract Urol 4: 677–687. doi: 10.1038/ncpuro0946
  30. 30. Scherzer CR, Eklund AC, Morse LJ, Liao Z, Locascio JJ, et al. (2007) Molecular markers of early Parkinson’s disease based on gene expression in blood. Proc Natl Acad Sci U S A 104: 955–960. doi: 10.1073/pnas.0610204104
  31. 31. Guttula SV, Allam A, Gumpeny RS (2012) Analyzing microarray data of Alzheimer’s using cluster analysis to identify the biomarker genes. Int J Alzheimers Dis 2012: 649456. doi: 10.1155/2012/649456
  32. 32. Honda H, Inaba T, Suzuki T, Oda H, Ebihara Y, et al. (1999) Expression of E2A-HLF chimeric protein induced T-cell apoptosis, B-cell maturation arrest, and development of acute lymphoblastic leukemia. Blood 93: 2780–2790.
  33. 33. Inaba T, Roberts WM, Shapiro LH, Jolly KW, Raimondi SC, et al. (1992) Fusion of the leucine zipper gene HLF to the E2A gene in human acute B-lineage leukemia. Science 257: 531–534. doi: 10.1126/science.1386162
  34. 34. Hitzler JK, Soares HD, Drolet DW, Inaba T, O’Connel S, et al. (1999) Expression patterns of the hepatic leukemia factor gene in the nervous system of developing and adult mice. Brain Res 820: 1–11. doi: 10.1016/s0006-8993(98)00999-8
  35. 35. Hou ST, Xie X, Baggley A, Park DS, Chen G, et al. (2002) Activation of the Rb/E2F1 pathway by the nonproliferative p38 MAPK during Fas (APO1/CD95)-mediated neuronal apoptosis. J Biol Chem 277: 48764–48770. doi: 10.1074/jbc.m206336200
  36. 36. Jiang SX, Sheldrick M, Desbois A, Slinn J, Hou ST (2007) Neuropilin-1 is a direct target of the transcription factor E2F1 during cerebral ischemia-induced neuronal death in vivo. Mol Cell Biol 27: 1696–1705. doi: 10.1128/mcb.01760-06
  37. 37. Smith RA, Walker T, Xie X, Hou ST (2003) Involvement of the transcription factor E2F1/Rb in kainic acid-induced death of murine cerebellar granule cells. Brain Res Mol Brain Res 116: 70–79. doi: 10.1016/s0169-328x(03)00253-5
  38. 38. Hoglinger GU, Breunig JJ, Depboylu C, Rouaux C, Michel PP, et al. (2007) The pRb/E2F cell-cycle pathway mediates cell death in Parkinson’s disease. Proc Natl Acad Sci U S A 104: 3585–3590. doi: 10.1073/pnas.0611671104
  39. 39. Gehrke S, Imai Y, Sokol N, Lu B (2010) Pathogenic LRRK2 negatively regulates microRNA-mediated translational repression. Nature 466: 637–641. doi: 10.1038/nature09191
  40. 40. Yamamoto K, Quelle FW, Thierfelder WE, Kreider BL, Gilbert DJ, et al. (1994) Stat4, a novel gamma interferon activation site-binding protein expressed in early myeloid differentiation. Mol Cell Biol 14: 4342–4349.
  41. 41. Bacon CM, Petricoin EF, 3rd, Ortaldo JR, Rees RC, Larner AC, et al (1995) Interleukin 12 induces tyrosine phosphorylation and activation of STAT4 in human lymphocytes. Proc Natl Acad Sci U S A 92: 7307–7311. doi: 10.1073/pnas.92.16.7307
  42. 42. Rubinsztein DC (2006) The roles of intracellular protein-degradation pathways in neurodegeneration. Nature 443: 780–786. doi: 10.1038/nature05291
  43. 43. Davie CA (2008) A review of Parkinson’s disease. Br Med Bull 86: 109–127. doi: 10.1093/bmb/ldn013