A simple, scalable approach to building a cross-platform transcriptome atlas

doi:10.1371/journal.pcbi.1008219

Fig 1.

The blood atlas is constructed by integrating many independent curated datasets.

Top row: the individuals PCAs of a set of quality-controlled independet datasets. These datasets are measured on a different platforms. Middle row: genes are rank transformed in order to move the expression distributions from the different platforms onto the same distribution. However, after running a PCA on the transformed data a platform clustering is still present. Bottom row: genes are univariately assessed for platform dependence, and filtered in order to keep only genes with a low fraction of the variance dependent upon platform. The resulting PCA then shows clustering based biological features.

More »

Expand

Fig 2.

The S4M blood atlas.

The S4M blood atlas integrates samples from 38 independent datasets. Each point is sample, and there are 3700 genes used in construction of the PCA. The colour indicates the annotated cell type. Progenitors sit in a region in the corner, while the the myeloid and lymphocyte arms separate out. The lymphocyte region includes both T and B cells. Dendritic cells sit in the cloud in the center if derived from an in vivo source, or cord-blood derived DC sit in a group.

More »

Expand

Fig 3.

The PCA coordinates of blood atlas samples after filtering genes with a decreasing platform variance fraction threshold.

In panel A the threshold is 0.8, in B it is 0.6, C it is 0.4, and D is 0.2. As the threshold is lowered, clusters of samples initially separated by platform, merge and form new clusters independent of platform.

More »

Expand

Fig 4.

Repeated application of the gene filtering and PCA upon annotated blood samples.

These panels show the results of repeated application of gene filtering and PCA upon the annotated blood samples in S4M. Each point is a sample, with colour indicative of annotated cell type (left column), or cluster identity (right column). The top left shows application to all blood samples as in Fig 2, while the top right shows the robust clusters defined upon these coordinates. The identities of these clusters are in Supplementary S2 Table. Their highlighted colours are propagated to the middle and bottom panels to display the behaviour of these clusters subsequent to recursive application. The middle row shows the PCA when variance modelling and filtering is applied only to the myeloid lineage clusters (2,3,4,5 and 6). The myeloid PCA shows the clusters defining monocyte, macrophages and dendritic cells separating into distinct regions. The bottom row shows the variance modelling, filtering and PCA upon the lymphoid lineage clusters (1 and 3). Now the increased resolution splits the lymphocyte cluster, 1, into more detailed subsets containing either T or B cells.

More »

Expand

Fig 5.

New and independent samples may be projected onto the Blood Atlas.

The projection of blood data from two external datasets, Haemosphere (https://www.haemosphere.org/) and van Galen scRNA-Seq [26]. The location of the projected data is consistent with the Blood Atlas.

More »

Expand