Fig 1.
Overview of the ORSO framework.
ORSO constructs a data-driven network based on social interactions and identified similarities between datasets. (1) Both validated public data and user-submitted data are hosted by ORSO. (2) Metadata and primary read coverage values are used to construct a data network, where connections represent similarities between datasets. (3) Based on data similarities, individual datasets are recommended to a user based on that user’s interests. (4) User interests are gauged by social interactions with datasets, such as favoriting and following. These interactions are in turn used to connect datasets in the network and impact the data recommended to other users.
Fig 2.
Recapitulation of biological associations in ORSO network and PCA views.
All plots were taken from ORSO without modification, except for label overlays. (A) PCA view of human RNA-seq data (hg19 assembly; 1,180 ENCODE datasets). The PCA was constructed considering read coverage values across models of mRNA transcripts. Similar cell types cluster in the same location in PCA space. (B) Network view of human RNA-seq data (805 experiments). Network topologies reflect similarities across cell types. The network layout was generated using a force-directed algorithm that minimizes the distances between connected nodes. (C) Dendrogram view of human RNA-seq data (805 ENCODE experiments). Network similarities were used in hierarchical clustering to create a dendrogram of biologically relevant cell type clusters. (D) PCA view of human ChIP-seq data (hg19 assembly; 4,502 ENCODE datasets). Similar protein targets, including histone modifications, are grouped together in a PCA created using promoter read coverage values. (E) Co-localization of histone modifications associated with active genomic regions in the human ChIP-seq PCA. (F) Co-localization of histone modifications with relevant protein targets in the human ChIP-seq PCA.
Fig 3.
Application of RNA-seq data from a hESC to cardiomyocyte differentiation time course to ORSO.
All plots were taken from ORSO without modification, except for label and transparency overlays. (A) Schematic describing the differentiation time course. (B) Differentiation datasets after integration in the human RNA-seq PCA. Early timepoints co-localize with hESCs while later timepoints co-localize with heart muscle samples. (C) Network view after integration of time course data with 805 ENCODE experiments. Localization of timepoints near hESC and heart data points reflect similarities predicted by the ORSO recommendation system.