C/VDdb: A multi-omics expression profiling database for a knowledge-driven approach in cardiovascular disease (CVD)

The cardiovascular disease (C/VD) database is an integrated and clustered information resource that covers multi-omic studies (microRNA, genomics, proteomics and metabolomics) of cardiovascular-related traits with special emphasis on coronary artery disease (CAD). This resource was built by mining existing literature and public databases and thereafter manual biocuration was performed. To enable integration of omic data from distinct platforms and species, a specific ontology was applied to tie together and harmonise multi-level omic studies based on gene and protein clusters (CluSO) and mapping of orthologous genes (OMAP) across species. CAD continues to be a leading cause of death in the population worldwide, and it is generally thought to be an age-related disease. However, CAD incidence rates are now known to be highly influenced by environmental factors and interactions, in addition to genetic determinants. With the complexity of CAD aetiology, there is a difficulty in research studies to elucidate general elements compared to other cardiovascular diseases. Data from 92 studies, covering 13945 molecular entries (4353 unique molecules) is described, including data descriptors for experimental setup, study design, discovery-validation sample size and associated fold-changes of the differentially expressed molecular features (p-value<0.05). A dedicated interactive web interface, equipped with a multi-parametric search engine, data export and indexing menus are provided for a user-accessible browsing experience. The main aim of this work was the development of a data repository linking clinical information and molecular differential expression in several CVD-related traits from multi-omics studies (genomics, transcriptomics, proteomics and metabolomics). As an example case of how to query and identify data sets within the database framework and concomitantly demonstrate the database utility, we queried CAD-associated studies and performed a systems-level integrative analysis. URL: www.padb.org/cvd

Introduction resolution tandem mass spectrometry (MS/MS) has become a widely-accepted method for measurement of proteins. [24,25].
Metabolites are not only the end-products of enzymatic reactions, but also active participants in the regulation of the cellular microenvironment, in which changes in abundance at the intracellular level of metabolite intermediates can promote changes in gene signal transduction [26,27] and therefore can be used to investigate the general health status [28].
Databases handling high-throughput expression profiles of miRNAs, genes, proteins and metabolites in pathological conditions and disease stages are still scarce, exceptionally in the case of general scope transcriptomics repositories such as NCBI Gene Expression Omnibus (GEO) [29] and EBI Expression Atlas [30]. However, the core source for publicly available omic data still resides in peer-reviewed publications, therefore we developed a clustered molecular repository for cardiovascular and vascular diseases, C/VD, containing expression profiles of microRNA, protein and metabolite across tissues, body-accessible fluids, and species derived of collected experimental data from peer-reviewed research publications and other relevant databases. Additionally, we performed as a show case of the database utility and to illustrate potential uses of the data enclosed in our newly established database, a meta-analysis integrating omics data sets at the pathway level in CAD.

C/VD database construction
Data collection, storage and handling. The content of the C/VD database is based on manual extraction of data from available literature on the topic of CVD and omics technologies. We used several search strings in PubMed and Scopus in order to cover the topic as much as possible (e.g. "Cardiovascular Diseases"[Mesh] AND (miRNA OR genomic � OR proteomic � OR metabolomic � ) NOT review) and filtered to display only studies in English language. Additional gene and miRNA data sets were retrieved from the Gene Expression Omnibus (GEO) Database [29]. We kept data in the tabular form and retained information of the original source for easy data tracking.
Data biocuration. Publication-based molecular identifiers were converted to internal identifiers of the Pan-omics Analysis Database (PADB) initiative (www.padb.org), which uses the clustered sequences and orthologs (CluSO) identifiers and an ortholog mapping resource (OMAP). These internal identifiers allow us to reduce data redundancy found in many multiomic studies. Moreover, it ensures that the C/VD database remains fully integrated into other databases held within the PADB infrastructure.
Database deployment. C/VD relies on a NoSQL database system based on dynamically interlinked and pre-assembled HTML files using in-house built software as described elsewhere in more detail [31]. Briefly, the generated database relies on a NoSQL-system, whereby source data is manipulated in spreadsheets and parsed into pre-assembled and interlinked html files. In C/VD there are three table sources handling data concerning demographics (clinical data), experimental setup descriptors and molecular descriptors. All of these tables follow the standard database nomenclature and structure, i.e. they are all linked to each other and to external databases, for instance UniProt/SwissProt [32], ChEBI [33] [37]. Thus, in this way the customised parser allows to emulate a database system by pre-assembling query outputs by for example grouping based on disease/ tissue/ molecule.
Conversion of the collected data from source tables to html pages that constitute the frontend of the database was ensured via in-house built software. Interactive controls were also added to the html tables such as sorting, pagination, multi-column sorting and data export features (Excel, PDF, CSV) by incorporating the DataTables plugin [38] for the jQuery JavaScript library [39].
A search interface was also implemented in the database that relies on the Hypertext Preprocessor (PHP) [40] scripting language, based on summary tables i.e. rearranged tables that simplify the underlying queries, therefore allowing fast, enhanced and customised searches across the database.
Availability and requirements. Home page: Accessible through the PADB portal at http://www.padb.org/cvddb. Operating system: C/VD is a web-based database thus it is platform independent. Requirements: works in any browser supporting HTML 4.01 and Java. License: free for academic use, but requires a license for any commercial purposes.

Case study with a C/VD subset: CAD
Data selection, pre-processing and meta-analysis. We selected a subset of C/VD, CAD data sets of human cohorts, screened in blood as comparison between CAD cases and control groups in at least two omics technologies. Since, C/VD only handles data below the P-value cut-off of 0.05, redundant molecular entries within the same study were combined by averaging their expression values and p-values were combined based on the calculation and combination of intra-studies P-values by using the popular and well-established Fisher's method [41]. Other methods exist, but Fisher's method performs better in our data type setting, since Stouffer's method Stouffer [42] requires studies with similar quality, such as sharing the same detection platform and also requires similar number of missing values.
A vote counting strategy [43] was used to combine molecular information, where differentially expressed (DE) molecules are first selected based on an overall statistical-P-value threshold defined as P <0.05. Then, dependent on the type of -omics technology/ platform, different fold-change (FC) cut-offs were applied: genomics, including miRNA and gene expression |FC|�2.0; proteomics �1.5 and metabolomics |FC|�1.3 to obtain a list of DE molecules for each study. The vote for each molecule can be then calculated as the total number of times (frequency (%) distribution) it occurred in all DE study lists to be combined. Afterwards, the final DE molecules list can be selected based upon a set of minimal number of votes. This simple approach and yet robust for further downstream analysis, outperforms other metaanalysis methods, since most cannot handle highly heterogeneous data types with a high degree of missing values. Then, based on the frequency distribution of the molecular regulation directionality (down or up-regulation trends) across three fold-change (FC) cut-offs (| FC|�1.3, 1.5 and 2.0) observed in microRNA (MIR), gene (mRNA), protein (PRO) and metabolite (MET) expression data. Prevalence was given to any molecular entity observing frequency �60% in any case of contradictory regulation.
Data dimensionality reduction. Principal component analysis (PCA) and non-metric multi-dimensional scaling (MDS) analysis were performed in Primer-E (version 6.1.6) [44]. The input data for the PCA was a matrix containing differential expression as log 2 fold-change (log 2 FC) of each molecular entity (as variables, N = 654; miRNA, protein and metabolite) across several Omics study types (as samples, N = 75). In this analysis we used molecular data reported in at least three independent studies; handling of missing data entries was resolved by setting it to zero. The triangular resemblance matrix required for MDS was based on the measure of Euclidean distances. MDS subsets were also generated to visualise in the 2D and 3D space superimposed Omics studies.
Functional enrichment and pathway analysis. The most reported molecular entries of each type of omic technology were the input for term clustering analysis performed using gene ontology (GO) annotation including biological process (BP), molecular function (MF) and cellular component (CC); pathway term clustering using the Kyoto encyclopedia of genes and genomes (KEGG) [45]and WikiPathways [46] annotation built-in ClueGO app [47] for the Cytoscape [48] network analysis platform. The default parameters were used (or otherwise intext specified) with an overall statistical significance value set to P -value <0.05.
Supervised development of a multi-omics integrative molecular model. The development of a molecular model in CAD followed an iterative design (Fig 1) involving data collecting, establishment of a multi-omics database, C/VD, meta-analysis, molecular clustering into GO and pathway terms and analysis of the interactome followed by several iteration swaps until the finalised output model was reached.
Post-transcriptional regulation by microRNAs (miRNAs) was verified by assigning miR-NAs-targets interactions using pre-computed target predictions ( Contextualisation of the created model regarding the studied disease, CAD, was ensured by adding the highest scored gene-disease associations (GDA) from the DisGeNET database [58] and reconnecting all the molecular interactions.
At the end, the calculated overall fold-changes of all molecules derived from the meta-analysis were incorporated and colour coded on top of the built network-based model.
The molecular content of the developed model was then tested for over-representation (hypergeometric test) and pathway topology analysis using KEGG [45] pathway terms from the web interface of MetaboAnalyst [59].

Database structure and navigation
Navigation through the C/VD database front-end can be either by browsing the study, sample/ demographic, tissue/source or by molecular entry index pages, or alternatively through molecule names, tissue or disease queries in the search interface (S1-S5 Figs).

Statistics and data summary of C/VD
The database consists of 13945 molecule entries (redundant) of which 4353 are unique (nonredundant) from a collection of 98 manually curated studies of 50 publications related with CVD. A summary of the database content using a snippet of the categorical and numeric variables (Fig 2) allows uncovering data trends amongst the many datasets that C/VD comprehends by plotting a histogram-like chart built using the Tableplot R-package [60]. From observation of the tableplot, the main molecular entries populating the C/VD database corresponds to human, detected in blood, mainly granulocytes and mononuclear cells from mRNA/gene in coronary heart disease (CHD). On the other hand, amongst all other molecular entities in C/VD, microRNAs (miRNA) tend to have lower raw p-values and higher foldchange (FC) values. Additionally, studies handling metabolomics profiling (MET) tend to have larger number of individuals displayed here as cohort/sample size (N) (Fig 2).
Almost half (48.53%) of the C/VD content ( Fig 3A) within a multitude of omics studies concerns coronary heart disease (CHD), followed by arteriosclerosis, 16.89%, and heart failure, 11.00%, amongst other conditions that fall under the 10% representation within C/VD. Most conditions are characterised at least by two types of omics technologies, with the exception of calcinosis, diabetes mellitus, cardiac resynchronization therapy (CRsyT) and stroke ( Fig 3A).
Representation of each omics technology type from C/VD by plotting molecular expression and statistical scores ( Fig 3B) helps in the first instance to assess data dispersion/variability and outlier detection. Here, it can be verified that many data points in proteomics (PRO) and metabolomics (MET) display a linear rearrangement perpendicular to the y-axis, which corresponds to the overall statistical cut-off selected by the authors in many experimental omics studies. On the other hand, gene (mRNA) and microRNA (miRNA) differential expression studies tend to have associated for each molecular entry individual statistical scores, instead of an overall cut-off ( Fig 3B).
The displayed gene/proteins ( Fig 3B) with higher fold-change and p-value thresholds such as RASSF3, XIRP1, SYNPO2L and SEPT7 are involved in cell shape remodelling, PRKCB1, MMSADHA, GYG1 and SPCS3 have enzymatic properties, TCF4 and BPTF have roles in transcription and translation (gene regulation), ANKRD46 and KTN1 are transmembrane proteins functioning as gateways of many substances, and FOLR1 is an anchor element in biological membranes acting as receptor. Others like MALAT1, TSC22D2 and CCDC197 have unknown roles. Highly expressed metabolites such as the LysoPC(14:0), a lysophospholipid with roles in lipid signalling and fatty acid (FA) metabolism; L-arginine an essential amino acid (AA) with numerous functions within the body; mannitol and N-Acetyl-D-Glucosamine 6-Phosphate (GlcNAc-6-P) are sugars based endogenous metabolites, the first an sugar alcohol with diuretic functions and the latter is an acylaminosugar. Both are involved in the phosphotransferase system (PTS) in bacteria what could suggest an association of cardiovascular disease with metabolites derived from the intestinal microbiota. Regarding microRNAs expression, hsa-miR-95, hsa-miR-142-5p and hsa-miR-206 have as predicted targets, genes involved in proteolytic cascades mediated by ubiquitin. Similarly, hsa-miR-208b targets genes with functional roles in the p53 signalling pathway (Fig 3B). Coronary artery disease (CAD) multi-omics data sets retrieved from C/VD regarding molecular entity type, for instance metabolite (MET), proteins (PRO) and microRNA (miRNA). Molecular entities from the studies were initially ranked by meta-analysis using a vote-counting approach (frequency assessment), in which statistical scores (p-values) were combined by implementing Fisher's method and differential expression of molecules (fold-change) were merged through a consensus approach. Following the meta-analysis, a combinatorial integration at the systems-level of all molecular entities concerning modulation of transcriptional activity via assessment of transcription factors (TF's), posttranscriptional regulation ascertained by microRNA-gene interactions (computationally predicted and experimentally validated), protein-protein interactions (PPI's), evaluation of metabolic flux and regulation via association of geneenzyme-reaction-compound and over-representation analysis of gene ontology (GO) terms and KEGG pathways. Data dimensionality reduction. The application of data dimension reduction methods (Fig 4), with principal components analysis (PCA) (Fig 4A) that describes 93.9% of the cumulative variance of PC1 (75.5%) and PC2 (18.4%) generated from the content of the C/VD Fig 2. Tableplot of the C/VD database. Each column represents a variable and each row (bar) is an aggregate of a fixed number of records, i.e. a row bin (here equivalent to a molecule entry). The numeric variables (log10 (N), log10 (p-value) and log2 (FC)) are displayed as bar charts and categorical variables (specie-disease) as stacked percentage bar charts. Disease type was used as the sorting variable. Cohort size (N), log10(p-value), log2(fold-change(FC)). Species: Homo sapiens (hsa), Mus musculus (mus), Tags: ENZ: enzyme, DIS: disease, SIG:  signalling, INH: inhibitor TF: transcription and translation, gene regulation, TP: transport, storage, endocytosis,   database based on a matrix handling differential expression of each molecular entity per Omics type as variables and study ID ("Exp") as samples. The PCA plot shows the score of each disease case and the loadings of each variable i.e. the molecular elements on the first two principal components. The greater spatial distance among most of the disease cases that overlap in the plot and heart failure (HF) suggests that the latter can be an outlier due mostly to contribution of microRNA expression, with hsa-miR-208b (BZJ12), hsa-miR-208a (BZJ35), hsa-miR-125b-1-3p (BZQ81) and hsa-miR-376a-2-5p (BZ615) as main molecular culprits. The visualisation of the level of resemblance among many CVD disease cases of the highly overlapped cluster ( Fig 4A) by a non-metric multi-dimensional scaling (MDS) method ( Fig  4B1 and 4B2) made possible to pairwise compare spatial distances at 2D ( Fig 4B1) and 3D ( Fig  4B2) among the origin of tissue/ fluid sources, e.g. blood and heart tissue.

Case study with a C/VD subset: CAD
Data sets description. We selected a subset of 21 studies from the C/VD database regarding CAD across miRNA (two studies), protein (nine studies) and metabolite (10 studies) expression detected in blood of human cohorts (Table 1). Additional description of other clinical parameters such as age, gender, and clinical history of cases and controls is available in S1 File.
The frequency (%) distribution of the regulation (down, up and not regulated) across three fold-change (FC) thresholds (|FC|>1.3, 1.5 and 2.0) of the most reported (at least two times) in CAD of microRNA (MIR), protein (PRO) and metabolite (MET) is shown in Table 2. In any case of contradictory regulation, prevalence was given to any observing frequency �60%. The expression matrix handling differential expression raw values is available in S1 File.
The 76 most reported molecules in CAD which includes 18 microRNAs, 16 proteins and 42 metabolites ( Table 2) were analysed either by gene ontology (GO), for their biological process (BP), molecular function (MF) and cellular component (CC) and/or Kyoto encyclopedia of genes and genomes (KEGG-compounds) ( Fig 5).
GO cluster analysis using ClueGO ( Fig 5A) showed that the most reported proteins in CAD are primarily involved in the biological regulation of cholesterol transport (negative regulation (APOC1 and APOC2)) and absorption (intestinal absorption (APOA1 and APO4)), modification of the chylomicron composition (APOB, APOC2, APOA1 and APOA4), regulation of metabolic processes regarding fatty acid biosynthesis (APOC1, positive regulation (ADIPOQ, APOA1, APOA4 and APOC2)) and cellular detoxification of superoxide radicals via peroxidase activity (APOA4, MPO, GPX3 and PRDX2). These molecules consist mostly of plasma lipoprotein particles (very-low-density (APOL1, APOC1, APOA4, APOA1, APOC2 and APOB), low-density (APOA1, APOC2 and APOB) and intermediate-density (APOB, APOC2 and APOA1)). The nine characterized proteins clustered in the network follow a trend of decreased expression so we can expect the same trend for their associated functions and biological processes described above. Additionally, ClueGO analysis of proteins reported once  Data input was a matrix with differential expression of each molecular entity per Omics type as variables and study ID ("Exp") as samples. IC10 (2016) disease classification was used as categorical labels. The PCA (a) displays both the PCA scores, X-axis: PC1 (75.5%), Y-axis: PC2 (18.4%) with the confidence ellipse of 95% surrounding CVD conditions and the PCA loadings with the molecular variables hsa-miR-208b (BZJ12), hsa-miR-208a (BZJ35), hsa-miR-206 (BZ290), hsa-miR-125b-1-3p (BZQ81) and hsa-miR-376a-2-5p (BZ615) represented. Visualisation of the level of similarity between CVD conditions of the highly superimposed cluster (a) was achieved by implementing a nonmetric multi-dimensional scaling (MDS) approach in the 2D (b1) and 3D (b2) space. The MDS plot exhibits pairwise dissimilarities as a measure of the Euclidean distances among data points. Tissue/ fluid sources (blood, heart) are represented as a factor of the MDS analysis. Similarly, compound analysis using the Kyoto encyclopedia of genes and genomes (KEGGcompounds) in ClueGO (Fig 5B) yielded the processes of biosynthesis of unsaturated fatty acids (hexadecanoic acid, (8Z,11Z,14Z)-icosatrienoic acid, alpha-linolenic acid, docosahexaenoic acid and icosanoic acid), steroid hormone biosynthesis (pregnenolone sulfate, cortisol and DHEA sulfate) and central carbon metabolism (L-tyrosine, L-histidine, L-valine, L-arginine and glycine) with child elements as biosynthesis of amino acids, ABC transporters, aminoacyl-tRNA biosynthesis and involved in protein digestion and absorption. Most of the metabolites, nine out of thirteen were found decreased in expression (exception for L-valine, L-arginine, cortisol and DHEA sulfate) so one can expect the same trend for their associated biological and metabolic processes.
Following the rationale that microRNAs exert post-transcriptional regulation upon mRNA/gene targets, thereby inversely regulated miR-target pairs can contribute to either an increased or decreased expression of the target. Therefore, based on the network cluster ( Fig  5C) that connects microRNAs-targets via miRanda target prediction (using the CluePedia 1.5.0 app) to biological processes and molecular functions and observing the regulation trend of microRNAs in Table 2, we can verify that 10 out of 13 microRNAs displayed in the network are decreased in expression (exception for miR-579, let-7c and miR-943), thus one can expect  (TaqMan OpenHSA), studies statistical score threshold: p-, q-value, FDR <0.05; high-density lipoprotein (HDL); blood pressure (BP); Body-mass-index (BMI); Males (M), Females (F); coronary artery disease (CAD), coronary disease (CD), acute myocardial infarction (AMI), stable angina (SA), unstable angina (UA), Non-obstructive coronary atherosclerosis (NOCA), normal coronary artery (NCA), type 2 diabetic coronary heart disease (T2DM-CHD), coronary atherosclerosis (CAS). that their targets would be increased in expression, so as their associated biological process. Specifically, as seen with miR-661, miR-20a-5p and miR-106b-5p that have a greater number  of targets and follow a down-regulation trend, their targets are involved in protein lipidation, C-acetyltransferase activity, adherens junction organization and cellular biogenic amine catabolic process for the first described microRNA and additionally the PI3-kinase activity and inner cell mass cell differentiation for the two latter described microRNAs (Fig 5C). Additional, search of potential affected KEGG pathways in mirPath v3.0 yield the fatty acid biosynthesis pathway (KEGG map, hsa00061), with up-regulated microRNAs, miR-1303 and miR-1305 targeting FASN.

Interactome analysis and network enrichment
We established a global interactome network (Fig 6) starting with the most reported proteins in CAD, making associations based on protein-protein interactions using STRING app. Afterwards, as a gap-filling approach, the least reported proteins (S1 File) were added, as well as the highest scored gene associations from the DisGeNET database (S1 File). Furthermore, we incorporated metabolic associations providing the linkage of gene-enzyme-reaction-metabolite by merging the former network with a metabolic network built in MetScape (S7 Fig), and described by ClueGO analysis (S8 and S9 Figs). Later, we kept only the most reported and regulated microRNAs (miRNAs), providing their association with gene targets present in the network via CluePedia. The possible occurrence of other regulatory elements in the recently developed network such as transcription factors (TF's) was ensured by the use of CyTargetLinker and GeneMANIA. This resulted in a densely interconnected network (Fig 6). This network of 136 entries (expanded full network contains 176 molecular entities, S10 Fig), including molecules known to be associated with the query-molecules, consists of a sub-expressed apolipoprotein-cluster (containing APOC1, APOB, APOA4, APOA1, APOC2 and APOL1) and LCAT, a main driver of the extracellular metabolism of lipoproteins, involvement of the complement and coagulation cascades (FGA, FGG, SERPIND1, KNG1 and VTN), suggesting inflammation being activated in the original source tissue, and processes including extracellular matrix receptor interaction (ITGB3, SPP1, CD44 and VTN), biosynthesis of unsaturated fatty acids (hexadecanoic acid, (8Z,11Z,14Z)-icosatrienoic acid, alpha-linolenic acid, docosahexaenoic acid, docosapentaenoic acid, icosanoic acid, arachidate and oleate), steroid hormone biosynthesis (pregnenolone sulfate, cortisol, corticosterone, 11-dehydrocorticosterone and DHEA sulfate), amino acid metabolism (L-arginine, L-histidine, glycine and L-alanine) transport (ALB, APCS), cellular detoxification of oxygen species (PRDX2, MPO and GPX3) and actin cytoskeleton (CFL1, ITGAD, ITGB3, ACTG1, TMSB4X). Inter-linking molecules suggest an involvement of transcriptional elements such as ATF2, PPARG, PPARA and EGFR, which are modulators of genes involved in DNA damage, cell proliferation, anti-apoptosis, glucose and lipid metabolic processes. Additionally, keeping only microRNAs highly expressed and inversely correlated with targets, yielded regulatory clusters driven by miR-1305 (directly targeting APOA1 and PPARA), miR-let-7c (directly targeting OLFM4, and resulting query-molecules S100A9 and MT-COX2/COX2) and miR-1303 (targeting both resulting query-molecules IL6ST and FASN).
Analysis of the condensed network of direct interactions between the 136 (or 176 proteins, full network, S10 Fig) proteins shows the apolipoprotein-cluster containing APOC1, APOB, APOA4, APOA1, APOC2 and APOL1, also associated with ALB and LCAT, more than three interactions between IL6, Il10, RETN, CRP, and ADIPOQ and also between FABP1 and the apolipoprotein-cluster hub APOA1. The latter's suggests that the peroxisome proliferator-activated receptor (PPAR) signalling pathway is perturbed in CAD through an association with APOA1, FABP1 (PPARA), ADIPOQ and RETN (PPARG) (Fig 6). Additional, manual queries in the C/VD database (human, CAD or atherosclerosis, detected in blood, gene expression studies, FC>1.5) looking for regulation trends of network molecules with absent regulation, resulted in decreased levels of PPARG, MT-CO2/COX2, and FASN, confirming the general trend of the PPAR signalling pathway to be inactivated/supressed/blunted. Data mining of the DisGeNET database by associated disease clustering having as input the 16 proteins from Table 2 showed association of MPO, APOA1, APOB and ADIPOQ with myeloperoxidase deficiency, hypertensive disease, hypercholesterolemia, hyperlipidemia and hypoalphalipoproteinemia. In a similar approach to mine pathway entries in the KEGG database, the common denominator was found as hypertrophic cardiomyopathy (HCM, KEGG entry hsa05410) containing the cluster IL6, ITGB3 and ACTG1 (Table 3), linking it to the ECM-receptor interaction, renin-angiotensin system JASK-STAT signalling pathway and TGFbeta-signalling pathway.

Discussion
The C/VD database handles multi-omics and clinical data retrieved from published experimental studies in literature and by mining NCBI Gene Expression Omnibus (GEO) [29] resources across multi-species and tissue/fluid sources in cardiovascular disease. Systems-level integration requires not only large-scale data, but mostly comprehensive data at all molecular levels, thereby owning a resource able to summon all of these elements altogether on a simple, comparative and clustered platform is of prime importance. Moreover, as C/VD is part of the Pan-omics Analysis Database (PADB, www.padb.org) framework, its stability over-time and regular update is ensured, which is a real problem that many databases face after publication. Network representation of microRNAs as node triangles, proteins/genes as circles, enzymes displayed as round rectangles, compounds as hexagons, and reactions shaped as diamonds, labels colour coded as irreversible/directed: orange, reversible/bidirected: purple. Enrichment using protein-protein physical interactions is derived from STRING with an established minimum confidence score of 0.70. Gene-enzyme-reaction-compound associations were established using MetScape 3.1.3 and KEGG. MicroRNA-targets associations were derived using miRanda and TarBase. Differential expression is represented as a colour gradation from blue (decreased expression) to red (increased expression). Node size is proportional to the number of molecules reported within data sets. Some compounds and microRNAs of the network were removed (based on fold-change criteria) to reduce the complexity of the figure and thereby enhance visualisation. Full network figure is available in S10 Databases covering cardiovascular disease molecular data exist, in the vast majority only covering single-omics derived data, with prevalence for gene-array data sourced from GEO, furthermore many are either offline or their domain redirects to other unrelated webpages. Table 3. Over-representation and pathway topology analysis using KEGG pathways terms. The analysis is based on the molecular elements that constitute the interactome of CAD. Topology analysis is based on degree centrality, which measures the number of links that connects to a node within a pathway. Regulation trend (Trend) is based on differential expression of either � compounds, or �� gene\proteins, or ��� both. Down-regulation (#), up-regulation ("), contradictory regulation or not applicable (NA). General pathway association (class): fatty acids (FA), amino acids (AA), aminoacylation of transfer RNAs (AAcylation), signalling (SIG), energy metabolism (EMET), vitamin 5 as precursor (Vit5), disease (DIS), immune system (IMM), antioxidants (AntiOx), hormonal (HOR), and cytoskeleton (Cytosk). Gene-centric or metabolite-centric analysis (|).

Pathway Total Hits P.Value Topology Gene\compound Class Trend
Biosynthesis of unsaturated fatty acids|hsa01040  CardioGenBase [61] displays gene-array data sourced from literature and GEO and its currently offline (verified on 05/07/2018), another database comprising literature-based candidate CAD genes, CADgene, [62] and its database live status is currently offline (verified on 05/07/ 2018, redirects to another unrelated web-page), likewise, In-Cardiome [63] deals with ownprivate and literature-based gene expression data, as well with clinical associations and drugs; Chemogenomics [64] is a literature-based database aiming to explore more extensively the pharmacological side in cardiovascular disease handling protein targets and small ligands, and is currently online (verified on 05/07/2018), but requires registration; COPaKB [65] mimics The Human Protein Atlas database [66] with a subset of data regarding cardiovascular disease and also GEO data, and its live status is online (verified on 05/07/2018). The generated interactome in coronary disease (CAD) based on initial meta-analysis and further systems-level integrative analysis of Omics studies covering circulatory levels of protein, metabolite and microRNA differential expression spotted disturbed profiles in lipid metabolism, in particular down-regulation of the unsaturated fatty acids (FA) biosynthesis, down-regulation of cholesterol binding and transport ability shown by the apolipoprotein containing cluster, phosphatidylcholine-sterol acyltransferase (LCAT) and (FABP1), the involvement of the proliferator-activated receptor (PPAR) signalling pathway, with particular emphasis on the generalised trend for down-regulation of the peroxisome proliferator-activated receptor gamma (PPARG), fatty acid synthase (FASN) and their enzyme product, hexadecanoic acid (palmitic acid). This is in synergy with putative post-transcriptional regulation of miR-1305 on PPARA and APOA1, a main apolipoprotein cluster network hub, and dual regulation of miR-1303 and miR-1305 of FASN. On the other hand, indication of up-regulation of molecular elements involved in the extracellular matrix (ECM)-receptor interaction, inflammation being activated in the original source tissue and cardiac hypertrophy.
Peroxisome proliferator-activated receptors (PPARs) alpha, beta/delta and gamma are ligand activated transcription factors that share a substantial homology, 60% to 80%, in their ligand and DNA-binding domains [67], despite playing different roles in the modulation of energy metabolism [68]. The three PPARs isoforms are expressed in the heart, including vasculature of endothelial [69,70] and smooth muscle cells [71,72], but their roles in the cardiovascular system and outcomes in disease are still not well established due to major inter-study variability [73]. The activation or repression of the expression of PPARs gene targets is modulated by transactivation or transrepression mechanisms [74], in which binding of ligands to PPAR leads to the establishment of heterodimers with the nuclear retinoid receptor (RXR), followed by translocation of PPAR-RXR-heterodimer-complex to the nucleus, with subsequent assembling of co-activator (e.g. PPARGC1A) complexes [75] and binding to specific PPAR response elements (PPREs) in the promotor region of target genes to induce their expression [70,76]. In contrast, in the absence of ligands, the PPAR-RXR-heterodimer-complex recruits co-repressors (e.g. NCOR1 and NCOR2) that repress gene expression in a DNA-binding-independent manner. When a ligand is present and binds, PPAR-RXR induces dissociation of the co-repressors complex and subsequent release of the co-repressor elements [70,76,77].
PPARs are implicated in a plethora of cardiovascular disorders such as cardiac hypertrophy, atherosclerosis and heart failure, and as well in conditions that pose a risk factor, such as diabetes mellitus, obesity, hypertension and dyslipidemia [78].
Post-transcriptional regulators such as hsa-miR-1303, shown up-regulated profile when human umbilical vein endothelial cells (HUVECs) were exposed to estradiol [93], also a down-regulation signature of this microRNA when human-induced pluripotent stem cellderived cardiomyocytes were exposed to doxorubicin [94]. Furthermore, hsa-miR-1305 levels in circulating monocytes was found to be increased after acute exercise as a stress signal in healthy volunteers [95], and hsa-let-7c associated with cardiac hypertrophy [96], and the mature sequence hsa-let-7c-5p appears to be associated with hyperglycemia in patients with coronary heart disease (CHD) [97].
Most patients with CAD undertake treatment for clinical management of other CAD risk factors, such as control of low-density lipoprotein (LDL) cholesterol levels, triglycerides and blood pressure, and therefore it is likely that the additional treatment can contribute to the global interactome profile highlighted above. Thus, we should empathise the potential role of statins (3-hydroxy-methylglutaryl coenzyme A, HMG-CoA reductase inhibitors), a group of drugs that are widely applied to reduce the level of blood low-density lipoprotein (LDL) cholesterol and triglycerides [98], and also found to increase the expression of PPARGC1A [99], a co-activator of PPARG. Other agonists as fibrates drugs are able to induce a diminished activity of LCAT in plasma [100].
The CAD showcase scenario presented here to demonstrate the utility of the C/VD database has some limitations as it only used data sets screening circulatory molecules in human cohorts, thereby potential disease mechanisms, disclosed in experimental studies using celllines and animal models mimicking CAD and as well processes mediated in specific tissuetypes could not be fully perceived. Moreover, we selected only data sets dealing with the clinical description of CAD, leaving out early events perceived in conditions such as atherosclerosis and late diseases such as myocardial infarction, which for a molecular perspective can pose an incomplete view over all the involved molecular events undergoing this disease, thus leading to a partial description of CAD.

Conclusions and future perspectives
The C/VD database can assist either on the development of novel hypotheses in a data-driven manner or be a source of literature-based knowledge in the cardiovascular research field.
In the CAD showcase, using a combinatorial systems-biology approach based on integration of data from circulatory molecular expression profiles covering microRNAs, protein and metabolite, we provide insights into the biological aspects, including summary descriptions by gene ontology (GO) and pathway terminology, covering mechanistic aspects through recreation of protein-protein interactions (PPI's), metabolic reactions, and providing regulatory elements exerted by transcriptional factors (TF's) and microRNAs post-transcriptional regulation. In CAD a global disturbed profile in lipid metabolism, including biosynthesis of fatty acids (FA), with blunted ability for cholesterol binding and transport, as well as the involvement of the proliferator-activated receptor (PPAR) signalling pathway, with PPAR isoform alpha (PPARA) being potentially post-transcriptionally regulated by miR-1305 and resulting sub-expression of PPARA target genes, APOA1 and FABP1. Furthermore, indication of disrupted biological processes with distinct increase in expression of molecular elements involved in activation of inflammation, extracellular matrix (ECM)-receptor interaction, and cardiac hypertrophy could be shown.
We foresee regular database updates, populating the C/VD database with more experimental studies since the cardiovascular research field, particularly large-scale untargeted approaches, screening body-accessible fluids are currently in high demand. Most significant enriched node terms are displayed darker. Colour filled compounds and proteins/genes are from CAD datasets (exception for ACE that is derived from DisGeNET). To improve network visualisation some ontology terms were removed from the figure, such as pyruvate metabolism, glucagon signaling pathway, and glycolysis/gluconeogenesis that have a grey start annotation. (TIF) S10 Fig. Full network representation of coronary artery disease (CAD) interactome. Network representation of microRNAs as node triangles, proteins/genes as circles, enzymes displayed as round rectangles, compounds as hexagons, and reactions shaped as diamonds, labels colour coded as irreversible/directed: orange, reversible/bidirected: purple. Enrichment using protein-protein physical interactions is derived from STRING with an established minimum confidence score of 0.70. Gene-enzyme-reaction-compound associations were established using MetScape 3.1.3 and KEGG. MicroRNA-targets associations were derived using miRanda and TarBase. Differential expression is represented as a colour gradation from blue (decreased expression) to red (increased expression). Node size is proportional to the number of molecules reported within data sets.