MK4MDD: A Multi-Level Knowledge Base and Analysis Platform for Major Depressive Disorder

Background Major depressive disorder (MDD) is a complex neuropsychiatric syndrome with high heterogeneity. There are different levels of biological components that underlie MDD and interact with each other. To uncover the disease mechanism, large numbers of studies at different levels have been conducted. There is a growing need to integrate data from multiple levels of research into a database to provide a systematic review of current research results. The cross level integration will also help bridge gaps of different research levels for further understanding on MDD. So far, there has been no such effort for MDD. Descriptions We offer researchers a Multi-level Knowledge base for MDD (MK4MDD) to study the interesting interplay of components in the pathophysiological cascade of MDD from genetic variations to diagnostic syndrome. MK4MDD contains 2,341 components and 5,206 relationships between components based on reported experimental results obtained by diligent literature reading with manual curation. All components were well classified with careful curation and supplementary annotation. The powerful search and visualization tools make all data in MK4MDD form a cross-linked network to be applied to a broad range of both basic and applied research. Conclusions MK4MDD aims to provide researchers with a central knowledge base and analysis platform for MDD etiological and pathophysiological mechanisms research. MK4MDD is freely available at http://mdd.psych.ac.cn.


Introduction
Major depressive disorder (MDD) is a common neuropsychiatric syndrome with a life time prevalence of ,17% of the population worldwide [1]. It has been predicted to become the second leading cause of disability worldwide by 2020 [2] and will cause a high global burden. As a complex disease, MDD is influenced by both genetic and environmental factors, with a heritability of approximately 37% [3]. Meanwhile, it presents symptomatic heterogeneity and involves changes in multiple systems [4,5]. The high complexity of cause and mechanism of MDD leads to difficulties in its early detection, treatment and prognosis.
In the last decade, supported by theoretical and technological developments, substantial advances have been made in MDD research. For example, several hypotheses on MDD pathophysiological mechanisms have been supported by evidence from molecular neurobiology [5,6]; large numbers of genetic studies, especially several genome-wide association studies, were carried out to identify MDD risk loci [7,8,9,10,11,12,13]; many studies emerged to discuss gene-environment interplay in MDD [14,15,16]; a series of new methods and research models have been applied in psychiatric research, such as next-generation sequencing technique [17], neuroimaging [18], and animal models [19]. Despite these advances, there are still various challenges. Towards the goal of uncovering mechanism of psychiatric disease, researchers from multiple disciplines tend to, on the one hand, narrow research targets to examine specific components of disease processes, thereby potentially improving the focus of investigation [20]; on the other hand, there is an increasing trend to integrate multiple levels of research data and technologies towards a systematic understanding of disease psychopathology [20,21]. For example, there have been studies to examine the influence of genetic variation on gene expression and brain structure and function for psychiatric disorders [21,22,23,24]. All these efforts accelerated the accumulation of related data and advanced our knowledge on MDD.
However, results from different fields of research are scattered in numerous publications, and there is a lack of a systematic review and collection of the currently available data and knowledge. Meanwhile, research at different levels remains fragmented, which hinders interdisciplinary research to articulate multiple levels of analysis. To solve these problems, one of the biggest challenges is the lack of tools to manage the complexity of data rapidly being accumulated across widely disparate methods, models and data types [25]. So a database to integrate data of different research levels will greatly facilitate MDD studies and will provide unique insights into how the dynamic interplay between different levels of data shapes individual risk for psychopathology. There have been some databases for psychiatric disorders currently available, including AutDB for autism [26], SZGene [27] and SZGR [28] for schizophrenia, and ADHDgene for attention deficit/hyperactivity disorder [29]. Although there are some disease-related databases such as SLEP [30] and HuGE Navigator [31] which collected genetic information for a set of disorders including MDD, they are not MDD specific. A recent publication reported a prioritized gene list (DEPgenes) for MDD by gene prioritization analysis [32], but the result is from prediction instead of literatureorigin and the data is not available by database or download. More importantly, all the currently available databases focus on the genetic basis of psychiatric disorders. MDD is a complex neuropsychiatric syndrome with great heterogeneity. Besides the genetic basis, it is of critical importance to focus on multiple levels of biological characteristics to establish a comprehensive multilevel knowledge base for MDD. The database of Multi-level Knowledge base for MDD (MK4MDD) has thus been developed as an innovative informatics tool to integrate different levels of data in published experimental studies of MDD. MK4MDD provides researchers a knowledge network, which contains integrated data and the interplay between data of different levels. It is also an analysis platform in which online customized analysis with vivid visualization could be done. By developing MK4MDD, we aim to facilitate a broad range of works on both MDD basic research and practical applications, and ultimately to facilitate an understanding of MDD mechanism and development of effective means for disease diagnosis, treatment and prognosis. Our innovative database schema could be a framework to be applied to the study of other complex psychiatric diseases.

Literature Search and Data Extraction
MK4MDD aims to provide multi-level data that cover the pathophysiological cascade of MDD from genetic variations to diagnostic syndrome. The pathophysiological cascade was classified as different levels of gene, protein, cellular system/signaling pathway, neural system, cognition, and symptom by taking reference of the paper published by Cannon TD et al [33] [5,6,26,27,28,29,33,34,35,36,37,38,39,40,41] which were elaborated in supplementary file. The searches resulted in 5,886 publications from the year 2001 through March 1st, 2012. By manually screening of these publications, MDD-related genetic studies, epigenetic studies, functional studies including animal models, imaging studies and psychological studies focusing on the etiological and pathophysiological mechanisms of MDD were retained. Studies about reliability of psychometric scales, epidemiology, and efficacy of anti-depressed drugs or treatment were not included. Finally, after filtering, there are 1,462 articles included in MK4MDD.
The abstract of each eligible publication was read carefully and data was extracted strictly sticking to the original publication. In MK4MDD, specific data of a certain research level is defined as a 'component' of disease process, and the association between two components of either the same research level or different levels is defined as a 'relationship'. For example, one study reported that 'significantly smaller hippocampal volumes were observed for patients and for controls carrying the Met-BDNF allele compared with subjects homozygous for the Val-BDNF allele (P = 0.006).' [42]. Based on this result, two components can be extracted including BDNF gene and hippocampus, and one relationship between these two components can be established as described in the original paper. For relationships, detailed information like statistical value (e.g. P-value, odds ratio, confidence interval), relation description were extracted. To facilitate an understanding of relationships, other detailed information of the study including background, sample information, method, result and conclusion, were also provided. Moreover, environmental events reported in publications from the search results of keywords in the table in Supplementary File S1, which might be putative risk factors for MDD, were also included into the database.

Data Integration and Analysis
The data from literature were classified into seven research levels according to pathophysiological cascade of MDD and data in different levels were integrated by relationships between components. The seven levels are: (1) genetic/epigenetic locus, (2) protein and other molecule, (3) cell and molecular pathway, (4) neural system, (5) cognition and behavior, (6) symptoms and signs, and (7) environment. The seven research levels were further classified into 14 data types. Detailed descriptions of data types and research levels are shown in Table 1. All components in MK4MDD are classified into appropriate data type in appropriate level. For instance, the two components mentioned in the above example, BDNF gene and hippocampus, are classified into 'gene' type in 'genetic/epigenetic locus' level, and 'brain morphology and function' type in 'neural system' level, respectively.
Careful curation and annotation were made on each component in a manual or semi-manual manner by reading reviews, textbooks or searching databases. For example, the name of component was standardized according to common databases or knowledge, such as finding approved symbols for genes in HGNC [43], and approved names for proteins in UniProt [44]. Detailed descriptions were provided for some types of data, such as diagrammatic presentation for neural system components and morphological and functional annotation for brain [45,46,47]. For components of SNPs, genes, proteins and molecular pathways, supplementary annotations were made for a deeper interpretation of current data, as some important annotations were not provided directly in original publications. The supplementary annotations include functional annotation for SNPs (non-synonymous coding SNPs, or SNPs leading to gain or loss of stop codon) using dbSNP [48] and Ensembl [49], mapping SNPs to genes according to their chromosomal locations, annotating genes by using gene ontology (GO) [50] and pathways from KEGG [51], BioCarta (http:// www.biocarta.com/genes/index.asp) and Reactome [52], GO enrichment analysis by DAVID [53,54] for genes, mapping genes to proteins by using the UniProt [44] database, and identifying interactions between genes by using the HPRD [55] database. On the other hand, hot components (defined as components with at least N studies, in which, the threshold N is different for different data type) were analyzed for each data type to provide reliable candidates for further research. These supplementary analyses enriched the content of the database to provide new clues for understanding the genetic basis and molecular mechanism of MDD. The process of data integration and analysis, as well as data schema to show the different levels of data and relationships among them are shown in Figure 1.

Data Content and Data Access
The data set of MK4MDD contains 2,341 components and 5,206 relationships, which are all based on reported experimental results from the literature. Statistics for each type of data are shown in Table 2. Supplementary annotations for components in the 'genetic/epigenetic locus' level and the 'protein and other molecule' level were included in MK4MDD. All data of MK4MDD are stored and managed by a MySQL relational database. To access the data, MK4MDD provides researchers a user friendly interface with powerful search and visualization tools developed by using Java/JSP running on an Apache Tomcat web server. There are several search options to facilitate access to the MDD data network. The 'Multi-level Search' allows users to search published reports of a given component of interest in all the seven research levels, so that researchers may acquire a systematic review of the current research status of the component. A blank input will result in a list of all MDD related components classified by research levels for a panoramic overview of MDD study. Among the seven research levels, in consideration of the genetic complexity of MDD, a special module of 'Genetic/Epigenetic Locus Search' is designed to facilitate a thorough investigation on genetic or epigenetic data. Particularly, to well demonstrate data connections, 'Cross Data Search' is implemented to search relationships between different levels of components. For example, to select the level of component A as 'Gene' and level of component B as 'Brain Morphology and Function', as well as to select 'Positive' in the field of 'Relation Type' and 'Patients study' and 'Patients and normal controls study' in the field of 'Study Type', will acquire 95 positive relationships between genes and components of brain morphology and function from studies of patients or patients and normal controls.
MK4MDD provides a detailed report for each component, which mainly includes the following information:  Cell and molecular pathway Molecular pathway Intracellular or intercellular molecular pathway that have been recorded in GO [42], Biocarta (http://www.biocarta. com/genes/index.asp), or KEGG [43].
Cell Cell type such as pyramidal neuron, lymphocytes.

Neural system
Neurobiological system Electroneurographic signals and comprehensive systems that usually contain multiple molecules, molecular pathways and cells (such as transmitter system).
Brain morphology and function MDD related structural and functional brain changes.

Cognition and behavior
Cognitive impairments and cognitive characteristics of MDD patients, as well as depression/anxiety-like behaviors from animal models

Symptoms and signs
Symptoms Diagnostic symptoms for MDD in DSM-IV (except cognitive impairments).

Signs
Clinical signs, such as blood pressure or heart rate.

Environment
Environmental events that are putative risk factors for MDD doi:10.1371/journal.pone.0046335.t001  To help users acquire customized data network, MK4MDD provides an unique module named 'My Relationship Set', which allows users to add selected positive relationships to establish users' own relationship set by clicking on the 'ADD' button from either 'Component Report' or search results of 'Cross Data Search'. Users can not only add literature-origin relationships into the 'My Relationship Set', but also relationships from supplementary annotation. On the 'My Relationship Set' page, users can do further editing on the selected relationships to generate a graphical data network, where relationships of literature-origin or from supplementary annotation are differentiated by using solid and dot lines, respectively. Users can drag or click on the nodes/edges for interactive operations and analysis. Users may also store their relationship set by using the download function and upload the set for further analysis when needed.

Application
MK4MDD is an innovative informatics tool to facilitate studies about etiological and pathophysiological mechanisms of MDD. By managing and integrating the complex data and data connections, researchers may start from a single component of interest to acquire a knowledge network across different research levels. MK4MDD will have a broad range of applications, both in basic and applied research, to advance our knowledge on MDD mechanism and for development of effect means for early detection and treatment of the disease.
For basic research, MK4MDD provided a systematic review for MDD related components which will help to advance hypothesisdriven research. Here we use an example to demonstrate how MK4MDD will facilitate generating new questions. It is wellknown that molecular mechanisms of MDD, especially what roles proteins play in MDD, are still unclear. To investigate this, it is necessary to get an overview of the current research status (i.e. reported data and data relationships between data) and then derive a new hypothesis. To achieve this, first, by using 'Cross Data Search', we searched for positive relationships between 'Protein' (Component A) and 'Neural system', 'Cognition and Behavior', 'Symptom and Signs' (Component B) (Figure 2 (a)). There are a total of 144 results. Second, we added all search results into a 'My Relationship Set' for visualization of components and relationships, there are total of 132 non-repeated relationships (Figure 2 (b)). Based on the graphical presentation, we found that brain-derived neurotrophic factor (BDNF) is one of the key focal nodes, so we selected BDNF for further study and deleted the previous 132 relationships from the broad range of searches to improve the focus of investigation. By 'Protein Report' for BDNF, we obtained 16 components related with BDNF protein, as well as the relationship between BDNF protein and BDNF gene from supplementary annotation (Figure 2 (c)). By 'Gene Report' for BDNF gene, we acquired 12 components related with BDNF gene. We added all 28 relationships surrounding both BDNF protein and BDNF gene into the 'My Relationship Set'. Finally this iterative analytical process resulted in 29 relationships (including the relationship between BDNF protein and BDNF gene) in the 'My Relationship Set'. From the graphical presentation of components and relationships (Figure 2 (d)), we found that BDNF protein is connected with protein CERB (cyclic AMP-responsive element-binding protein 1) and BDNF gene is connected with both protein CERB and brain region amygdala. Based on this information, we may propose several questions. For example, would BDNF contribute to MDD by regulating CREB in the brain region of amygdala? What cognitive impairments will be caused by molecular activation of the above mechanism? Do BDNF related symptoms (depression mood, anhedonia and suicide) appear based on this potential mechanism? New questions will help drive new findings to accelerate our knowledge on MDD mechanism.
Another example of application is on study of endophenotypes. As researchers tend to improve the focus of investigation by examining specific components of disease processes, the study of endophenotypes becomes more and more important in uncovering disease mechanism and developing effective method for disease diagnosis. The term 'endophenotype' is defined as an internal phenotype that fills the gap between available descriptors, the gene and the elusive disease process [57]. It should fulfill the following criteria: (1) associated with the illness in population, (2) heritable, (3) state-independent, (4) found in unaffected family members at a higher rate than in the general population, and (5) shown to cosegregate with the illness within families [58]. As MK4MDD collects different types of disease-related components in MDD patients, unaffected first-degree relatives and healthy controls, it will assist in identifying the basis of biological or clinical plausibility of a putative endophenotype. For example, through 'Cross Data Search', we found six components in neurobiological system associated with MDD by setting the 'Study Type' to 'High-risk people study'. The results provided proofs on disease association, familial association and heritability for the six neurobiological components. Among the six components, we focused on an eventrelated component named P300. From the 'Neurobiological Component Report' for P300, we found a reference to support the state-independence of P300 [59], and the data network of P300 showed that P300 is associated with protein S100B and psychotic symptoms, which prove its biological and clinical plausibility. Based on all of the above evidence, we may choose P300 as a potential endophenotype for further study.
To facilitate the discovery of potential biomarkers is an important application of MK4MDD in applied research. The definition of a biomarker from the National Institutes of Health (NIH) is 'a characteristic that is objectively measured and evaluated as an indicator of normal biologic processes, pathogenic processes, or pharmacologic responses to a therapeutic intervention' [60]_ENREF_39. Generally a biomarker is anything that can be used as an indicator of a particular disease state [60]. In the past years, continuous efforts have been made in looking for suitable biomarkers as diagnostic or therapeutic targets [61,62,63]. To provide new clues on biomarkers, MK4MDD provides a list of potential biomarkers by analyzing the current research data. There are several principles applied to the analysis: 1) directly related to MDD and reflected differences between patients and normal controls; 2) measured from peripheral blood, cerebrospinal fluid or other clinic common samples, or recorded from neural imaging methods; and 3) shown in a verifiable way (such as protein concentration, SNP genotype and brain volume). For example, using magnetic resonance spectroscopy (MRS), a study observed significantly reduced glutamate levels in the patients' left cingulum compared to healthy controls [64]_ENREF_53. The study shown glutamate is related to MDD and can be measured and evaluated by MRS, so it can be regarded as a potential biomarker. There are a total of 480 potential biomarkers at different research levels according to our analysis. Users may search potential biomarkers by specifying the name, or defining research levels or test methods. Detailed evidence for each potential biomarker is provided in a corresponding 'Component Report'.

Discussion and Future Development
MK4MDD contains highly integrated knowledge from PubMed with manual curation and supplementary annotation. It has three notable features: 1) MK4MDD is the first database specific for MDD and contains multi-level data. Different types of MDD related data were collected and further categorized into seven levels in the pathophysiological cascade of MDD from genetic variations to diagnostic syndrome, which allows researchers to acquire systematic knowledge for MDD from the database. 2) MK4MDD emphasizes on the interplay of different levels of data across widely disparate disciplines, methods and data types by collecting literature-reported relationships, so that all data in MK4MDD form a cross-linked data network. Researchers may start from a single type of data (e.g. gene or protein) to acquire a knowledge network from molecular level to mind level by navigating MK4MDD. 3) MK4MDD is not only a knowledge resource, but also an analysis platform for interdisciplinary research to accelerate the pace of new discovery in psychotic mechanism. The powerful search and visualization tools enable access to both data and data connections for identification of novel targets and questions. Through the module of 'My Relationship Set', researchers can easily target, evaluate, and prioritize MDD related components of interest for future research.
It is worth mentioning that all components and relationships included in MK4MDD are based on experimental results of MDD reported by original articles. Relationships inferred by review articles and studies for people with depressed mood or trait but not diagnosed patients were not included in MK4MDD. In the next few years, the number of MDD studies is expected to keep increasing especially with the development of new research strategies and new technologies. MK4MDD will be periodically updated to ensure a most up-to-date follow up of the research progress of MDD. Moreover, the current version of MK4MDD focuses on core pathophysiological components of MDD, so the social and environmental factors are not included in the current focuses. In future, MK4MDD will continue to expand its coverage to these fields. Meanwhile, we will extend the functionality of MK4MDD by adding new modules for endophenotype study and gene-environment interaction study. MK4MDD is dedicated to provide a powerful and useful MDD resource for researchers. However, it is really a challenge to cover all relevant terms of such a complex human disorder with broad clinical manifestations and multiple risk factors. We encourage researchers to propose new search terms or to upload articles by providing PMID via the specially-designed 'Upload' module on the website. As the first multi-level knowledge base for complex psychiatric disease, MK4MDD demonstrates a novel research framework for interdisciplinary research, which could be applied to other psychiatry disorders, such as schizophrenia and bipolar disorder. We believe that with the support from bioinformatics tools and emergence of new data and knowledge, future investigations will accelerate the pace of new discoveries. We hope our continuous efforts will help to unveil the psychiatric mechanism of MDD and to contribute to global mental health.

Supporting Information
Supplementary File S1 Criteria for keywords selection and all keywords employed to search MDD related publications in PubMed for MK4MDD.