HSC-Explorer: A Curated Database for Hematopoietic Stem Cells

HSC-Explorer (http://mips.helmholtz-muenchen.de/HSC/) is a publicly available, integrative database containing detailed information about the early steps of hematopoiesis. The resource aims at providing fast and easy access to relevant information, in particular to the complex network of interacting cell types and molecules, from the wealth of publications in the field through visualization interfaces. It provides structured information on more than 7000 experimentally validated interactions between molecules, bioprocesses and environmental factors. Information is manually derived by critical reading of the scientific literature from expert annotators. Hematopoiesis-relevant interactions are accompanied with context information such as model organisms and experimental methods for enabling assessment of reliability and relevance of experimental results. Usage of established vocabularies facilitates downstream bioinformatics applications and to convert the results into complex networks. Several predefined datasets (Selected topics) offer insights into stem cell behavior, the stem cell niche and signaling processes supporting hematopoietic stem cell maintenance. HSC-Explorer provides a versatile web-based resource for scientists entering the field of hematopoiesis enabling users to inspect the associated biological processes through interactive graphical presentation.


Introduction
The term ''Hematopoiesis'' describes the life-long regeneration and repair of the blood system. All blood cells are ultimately generated from multipotent hematopoietic stem cells (HSCs) which are the only cell type capable of long-term (if not life-long) self-renewal (i.e. generation of daughter cells with HSC potential). According to the classical view of hematopoiesis, HSCs generate multipotent and committed progenitors which produce terminally differentiated cells. Characterized by a massive production rate (10 11-12 blood cells per day in an adult human), hematopoiesis is tightly regulated by intrinsic mechanisms as well as extrinsic cues which balance various cellular behaviors, such as quiescence, selfrenewal, differentiation, homing and migration.
In order to study these behaviors hematologists have been enriching hematopoietic stem cells for over 25 years using various purification strategies utilizing flow cytometry and functional in vitro and in vivo assays [1,2]. While in the 1990's the stem cell enrichment was dominated by the usage of three to four markers (cKIT, sca-1, CD34 and a mixture of several blood lineage specific markers) reaching purities of at least 20% [2], technical advances during the last decade made it possible to distinguish subpopulations with theoretically up to 17 markers [3]. The utilization of additional markers in recent years has led to the emergence of a variety of purification strategies yielding stem cell purities over 50% [4,5]. Although some of these strategies are closely related, others utilize a completely different set of markers. If and to what extent the results of different purification strategies are comparable is unclear. A comparison of the gene expression profile between HSC populations purified using different enrichment protocols suggests that it might be limited [6].
In addition to that, the criteria for cells to be classified as HSCs keep changing every few years (i.e. length of repopulation/ contribution upon transplantation). Terminologies like Long-Term (LT) and Short-Term (ST) HSCs are functionally defined (LT-HSCs repopulate mice longer than 16 weeks, ST-HSCs shorter than 12 weeks) and do not necessarily correlate with certain enrichment protocols. However, the use of such terms is not consistent throughout the literature. This inconsistency together with the intrinsic heterogeneity of the HSC compartment [7,8] hampers proper interpretation and direct comparison of results from different publications. For this reason a comprehensive understanding of the current knowledge in the field requires the collection of the various experimental results, within a unified resource.
Currently, hematopoiesis-specific databases such as Hematopoietic Fingerprints [9], HemoPDB [10] or StemBase [11] collecting information about gene expression or transcriptional regulation in hematopoiesis are available. However, there is a demand for a resource that provides information about the interactions between cellular components and signaling processes characterizing the diverse stem cell subpopulations isolated so far and their stem cell-related functions also in context with the stem cell niche.
Here we present HSC-Explorer, a publicly available, manually curated, integrative database collecting literature-derived knowledge about the different hematopoietic stem cell subpopulations and their behavior in repopulation activity, self-renewal and quiescence and how these processes are regulated by intrinsic and extrinsic factors. The resource covers in particular the early steps of differentiation from the most primitive hematopoietic stem cells (HSC) to more differentiated multipotent progenitor cells (MPP) in adult mice. Multiple search options and an interactive graphical tool enable information retrieval of the manifold interrelations between factors and processes and their presentation as informative network structures.

Curation of Information from hematopoietic literature
The complete content of the database is generated by biocurators who manually extract hematopoiesis-specific and experimentally verified information from the scientific literature. Annotation is performed according to the procedures applied in our CIDeR database [12] and if required adapted to the peculiarities in hematopoiesis. Information in HSC-Explorer is described using three types of information ( Figure 1): general information ( Figure 1A), textual information (comment) ( Figure 1B), and structured, machine-readable information ( Figure 1C).
The general information (A) refers to the broader context of the experimental findings. This includes the literature reference, the organism used for the experiments and information about the organism strain, gender and age, if specified in the publication. The information about mouse strains is especially important for the purification of murine HSCs since some frequently used stem cell markers, including Thy-1 and Sca-1, are not conserved among mouse strains [5,13]. Since most studies in the field of hematopoietic stem cells are performed with mice, the vast majority of HSC-Explorer (90%) consists of experimental results from mice. Experimentally verified biological information is shown as short comments in the textual information part of the annotation (B). If appropriate, it also includes supplemental details about the experimental results, for example, the amount of cells that are found in different stages of the cell cycle. In addition, a short description of the methods used for the experiment is presented to inform the user how the data are obtained. The structured information (C) translates the biological findings into very basic information given as the relation between two elements. Elements can be genes (respective proteins), metabolites, miRNAs but also cellular processes, tissues (respective cell types) or a stem cell sub-population characterized by the marker combination (immunophenotype) used for its purification. For the curation of the structured information we use established resources such as Entrez Gene [14], Gene Ontology [15] and CORUM [16] and the vocabularies used therein. Standardized information is a prerequisite for applications such as the construction of biological networks and the generation of diagrams or subsequent post processing of data using bioinformatics methods. In addition, we indicate whether the result is obtained from in vitro or in vivo experiments.
All data from HSC-Explorer can be downloaded as flat files or in SBML format (Systems Biology Markup Language), a free and open interchange XML format for computer models of biological processes [17]. Graphical outputs can be downloaded in JPEG format or GraphML format. The latter can be opened and edited via the graph editor yED (yWorks GmbH, Tübingen, Germany). For the future, we will work on expanding the database to ultimately cover all hematopoietic progenitor populations and generate novel 'Selected topics'.

Database Contents
The HSC sub-populations and their activities. One crucial question in hematopoietic stem cell biology is the identification of markers that distinguish primitive stem cells from more differentiated progenitors with reduced self-renewal and repopulation activity [18]. In the last years several highly enriched HSC populations have been identified, for example by introducing SLAM markers for HSC enrichment [5], or by considering HSCspecific functional properties such as higher capability for Hoechst dye efflux (side population SP) [1,7] or by detecting low levels of Rhodamine 123 staining (Rho(lo)) [8,19]. HSC-Explorer provides a comprehensive collection of these HSC subpopulations (Table 1), illustrating the heterogeneity of stem cell enrichment protocols published so far. If applicable, the potential of each HSC population for multilineage reconstitution is stated in our database. Since the term ''long-term-HSC'' (LT-HSC) is heterogeneously used throughout the literature, we classify stem cells not only with the term ''LT-HSC'' but also by linking them to the process ''repopulation .16 weeks''. Interestingly, repopulation activity was measured for longer than 32 weeks in some studies [2,[20][21][22]. These stem cells are linked with the process ''repopulation .32 weeks'', to emphasize that their repopulation activity is higher than the currently used standard of 16 weeks. More differentiated progenitor cells are linked with the term ''repopulation ,12 weeks''.
In addition to their repopulation activity, hematopoietic stem cells are further characterized in HSC-Explorer by their proliferation state (quiescence) and their capacity to self-renew. The term ''self-renewal'' is only used in cases of reconstitution of secondary recipients.
In table 1 'milestone publications' describing new markers, which highly improved the purification of HSC are summarized. The different attributes of these stem cell enriched populations concerning repopulation activity, self-renewal and quiescence are indicated. Whenever the authors performed single cell transplantation this is mentioned in the table as well. The graphical network of these data as it is obtained with a search in the database with the term 'keypopulation' is shown in the figure S1.
The stem cell niche. The concept of the hematopoietic stem cell niche has been introduced in 1978 by Schofield [23] when he demonstrated that the bone marrow stromal cells play an active role in the regulation of the stem cell fate. Progress in the purification of HSCs [5] and in live imaging techniques [24] enabled the identification of candidate niche cells, which are all listed in tables 2 and 3 together with the HSC sub-population described in the respective study. These tables also list the genes and the corresponding signaling processes responsible for the interplay between stromal and hematopoietic cells. In addition, HSC-Explorer provides information about cellular processes important for establishing the HSC niche like homing, cell migration, mobilization and cell adhesion.
Curation of high-throughput data. Distinct HSC subpopulations are also characterized by their expression profiles. Several microarray studies have been performed to identify candidate genes involved in differentiation and self-renewal or quiescence [9,25]. In total, the 3590 most significantly, differentially expressed genes identified in these large-scale analyses are mentioned and can be retrieved from the database with a search for ''expression profile''. As already discussed by Vogel [26], expression profiles of HSC populations analyzed in different labs are almost completely different, even if the same purification procedures have been used for isolating the stem cells, indicating that the populations were heterogeneous [27]. However, the inclusion of various kinds of experimental data in HSC-Explorer allows combining genes which have been found by only large-scale expression analysis with experimentally verified data. Therefore we compared the expression profiles, including microarray data and qPCR data, of four different HSC subpopulations (HSC (CD34-SP,KSL), HSC (Rho(lo)KSL), HSC (SP,KSL), HSC (Thy1(lo)CD135-KSL) ) with long-term repopulation activity. As shown in the graph on the homepage (Selected topic) the gene sets are quite different and only two genes, Procr and Rbp1, have been identified in all four subpopulations. Procr is a well-known surface antigen already used for the purification of hematopoietic stem cells. The retinol-binding protein Rbp1, an important factor in retinoic acid synthesis, is involved in granulopoiesis [28] and has recently discussed to be hypermethylated in glioma tumors, resulting in decreased expression of Rbp1 [29]. To our knowledge a role of Rbp1 in the differentiation of hematopoietc stem cells has so far not been shown and would be interesting to investigate.

Search Options and Web Interface
A flexible web-based interface of HSC-Explorer allows both general and advanced searching of the database. By default all fields are searched. However, the user has the option to restrict the search to a specific category such as 'Immunophenotype', 'Gene', 'Biological process', 'Tissue', 'Chemical compound', 'miRNA', the PubMed identifier as well as the authors of articles. The category 'Immunophenotype' had to be introduced in this context as currently no single defined hematopoietic stem cell exists, instead, a variety of different HSC-sub-populations have been described in the literature so far. All these sub-types are characterized by a combination of markers used for their purification, the so-called immunophenotype (for example HSC CD150 + CD48 -KSL [cKit + Sca1 + Lin 2 ]). Genes are described by their official Entrez Gene name [14] (e.g. Slamf1). For convenient retrieval of required information respective synonyms from Entrez Gene and KEGG [30] are also included for searches. The category 'Tissue' summarizes all the different stromal cell types known to establish the stem cell niche. Stem-cell-specific processes such as 'repopulation activity', 'self-renewal activity' or 'quiescence' are summarized in the search field 'Biological process'.
Search results can be improved further with iterative searches by using the 'Refine Query' option. Results of the new search term are combined with the first search term by using one of the three Boolean operators 'AND' 'OR' and 'NOT'. The search results are listed as tables that can either be linked to the complete annotated information or can be graphically visualized with a tool that dynamically generates an interaction network. Graphs are interactive and provide several options for retrieval of information or exploration of interaction networks. While moving the mouse cursor over the edges, pop-up windows with information about the respective interaction appear. Another useful function is the extension of the graph. By double-clicking on a node of interest all additional relations linked to the node are shown. Further functionalities about the database and the graph tool are explained in online help documentations and two tutorial movies.
In addition to the ability to generate user-defined graphs, HSC-Explorer provides several predefined graphs (Selected topics) on the homepage displaying, for example, proteins known to influence the proliferation state (quiescence) or the self-renewal activity of a stem cell. Short summaries of signaling processes known to affect hematopoiesis, such as Cxcl12/Cxcr4 signaling, Ang1/Tie2 or Notch signaling are presented as well. A graph, showing an overview about niche-related data, including stem cell sub-populations, cellular components of the bone marrow niche, extrinsic or intrinsic factors and signaling processes necessary for establishing the niche is provided. A small outline thereof, displaying only the co-localization of HSCs to non-hematopoietic tissue is given as a selected topic as well.

Application Examples
An objective of HSC-Explorer is to present complex experimental results relevant for murine hematopoiesis as an intuitive graphical output. The following examples illustrate the capability of HSC-Explorer to display complex data in a comprehensive and easy to follow manner.
1. The role of Cxcl12 signaling in homing and engraftment of hematopoietic stem cells. The chemokine Cxcl12 plays a key role during ontogeny of the hematopoietic system. A graphical visualization of the Cxcl12 interactions in HSC-Explorer clearly shows that Cxcl12 is mainly expressed in non-hematopoietic tissue like osteoblasts [31] or CAR cells [32]. Extension of the network with the Cxcr4 interactions reveals the following results ( Fig. 2): (i) The chemokine receptor Cxcr4 is mainly expressed in LT-HSCs (e.g. HSC (CD34 -KSL) or HSC (CD150 + CD48 -CD41 -KSL). Cxcl12/Cxcr4 signaling not only plays a pivotal role in the regulation of HSC homing [discussed in [33]] and LT-repopulation activity of stem cells [34], but is also involved in the maintenance of quiescence [34], probably by increasing the expression level of the cell cycle inhibitor Cdkn1c (p57-kip2) [35] or by down-regulating of the expression of other cell cycle regulators such as cyclin D1 (Ccnd1) [35]. (ii) The graph shows that depletion of Cxcr4 results in an down-regulation of HSC regulators like Tek (Tie2), Vegfa and Junb in HSC (CD34 -KSL) [34]. (iii) Addition of Angiopoietin (Angpt1) inhibits Cxcr4 expression in vitro [36] whereas Wif1, a known Wnt signaling inhibitor increases the expression of Cxcl12 [37]. (iv) Not only genes but also miRNAs are involved in the regulation of Cxcl12 expression. Transfection of miR-886-3p into a Cxcl12 + stromal cell results in a down-regulation of Cxcl12 [38]. (v) Furthermore, the graph reveals the cooperation of Robo4 with Cxcr4 in the homing process. Loss of Robo4 is compensated by up-regulation of Cxcr4 expression [39]. (vi) High levels of Cxcl12, e.g. induced by 5-FU treatment or irradiation, are known to elevate MMP9 levels [40] which results in high amounts of soluble KitL, a factor important for cell mobilization [41].
2. Graphical visualization of the opposing effects on hematopoiesis induced by the polycomb repressive complex PRC1. Fine control of gene expression by modulating the chromatin structure is a widely used mechanism in eukaryotes to regulate cell development. The multisubunit polycomb repressive complex PRC1 catalyzes histone modifications and has been implicated in the maintenance of hematopoietic stem cells. Figure 3 illustrates that the subunit composition of PRC1 is responsible for the balance between self-renewal and differentiation of hematopoietic stem cells. Cbx7 is highest expressed in primitive stem cells, whereas the protein levels of Cbx8 increases during lineage commitment [42]. Both Cbx genes compete for integration into the PRC1 complex and have different effects on stem cell activity. While Cbx7 inhibits HSC differentiation and induces self-renewal, Cbx8 has the opposite effect. In addition it is known that the polycomb-subunit Bmi1 has a positive effect on stemness of adult hematopoietic stem cells [43], whereas the Bmi1 paralogue Pcgf2 Table 3. Bone marrow stromal cell types (vascular/endothelial) known to contribute to the hematopoietic microenvironment.  (Mel18) negatively regulates the self-renewal activity [44]. But not only the composition of the complex is decisive. PRC1 is also regulated by external MAKPK kinases such as Mapkap2 (MK2) to improve the balancing between HSC self-renewal and differentiation [45].
In conclusion, HSC-Explorer provides a publicly available integrative resource in the field of hematopoiesis. HSC-Explorer has been developed to present biological findings in form of a comprehensive and intuitive graphical network, which will enable scientists to explore hematopoiesis in a more systems-oriented approach.